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CHAPTER 1 



THE PROBLEM 



The purpose of this study was to determine the feasibility of scoring 
writing samples in addition to objective measures in large-scale assessment 
of written composition and to determine the status of writing assessment 
among local and state education agencies* Feasibility was measured in 
terms of cost, time and management factors using a writing sample in 
relation to information yielded as compared with information yielded using 
objective measures. The state of the art was determined by a survey 
completed by practitioners in local and state education agencies. In light 
of the central question of the study, alternatives for scoring writing 
samples were explored. Also possible management techniques were identified 
which might moderate the cost of scoring large numbers of writing samples ♦ 

The Research Questions 

This proposed study was designed to determine the feasibility of 
scoring a writing sample in addition to an objective measure of writing in 
a statewide writing assessment. The following questions were addressed: 

1. What is the relationship between students' primary trait score on 
a writing sample and their corresponding score on an objective 
measure of writing mechanics for each group of students in the 
design? 

2. What is the relationship between a syntax score on a writing 
sample and the syntax score on the objective measure for each 
group of students? 

3. What is the relationship between the number of capitalization 
errors on the writing sample and the number of capitalization 
errors on the objective test for each group of students? 

4. What is the relationship between the number of punctuation errors 
on the writing sample and the number of punctuation errors on the 
objective test for each group of students? 

5. What is the relationship between the number of spelling errors on 
the writing sample and the number of spelling errors on the 
objective test for each group of students? 

6. Are the mean scores on the objective test for each group of 
students in the design significantly different from the mean 
scores for every other group? 

7. Are the mean scores for primary trait on the writing sample for 
each group of students in the design significantly different from 
the mean scores for every other group? 
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8. Is a profile analysis constructed for each group of students from 
data yielded from an evaluation of a writing sample similar to a 
profile analysis constructed for that group from data yielded from 
an objective test of writing? 

9. Is the cost of evaluating a writing sample justified based upon 
additional information yielded? 

10. Is the time involved in a writing sample justified based upon 
additional information yielded? 

11. What is the "state of the art" of writing assessment among SEA's 
and LEA's with respect to the proposal's central question: "what 
are the relative costs of assessing writing by means of 
standardized tests versus judged written essays and what are the 
additional kinds of information yielded by the latter that may 
justify the added cost?" 

12. What are alternatives for scoring writing samples or what are 
other strategies for state-wide assessment in which the added cost 
of detailed analyses can be moderated? 



Delimitations of the Study 

In order to address the research questions identified in this study, 
two populations were selected. The delimitations of each population is 
described below. 

Louisiana Student Population 

In order to address the questions involving the relationships between 
student scores frcm objective measures and corresponding scores on writing 
samples, results of the Louisiana state-wide writing assessment were used. 
In the 1979 touisiana Assessment Program, students in grades four, eight, 
and eleven were administered proficiency tests in April of that year. All 
students at the designated grade levels participated in the program. 
Objective tests for all students were , scored. However, a random sample of 
2,500 writing exercises per grade level were selected. Scorers consisted 
of twenty-five classroom teachers who had been recommended by their 
respective superintendents. 

Practitioners in Local and State Education Agencies 

In order to determine methods practitioners are using to assess 
writing, a questionnaire was mailed to each of the fifty state education 
agencies and to fifty large city school systems (See list in the 
Appendix.) Fran the respondents ten persons were randomly selected to be 
invited to a conference to discuss their respective writing assessment 
programs. 
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Importance and Significance of the Study 



Accountability legislation in forty states has charged state education 
agencies and local education agencies to provide information relative to 
student achievement in the basic skills. The assessment of reading and 
mathematics appears to have been relatively straight forward. However, the 
writing assessment has presented a particular challenge in that a variety 
of methods has been used to score writing. Two schools of thought appear 
to be emerging, "one decrying objective testing and the other insisting 
that most important mental processes, including the composing of essays, 
can be measured well by objective items" (Stanley and Hopkins, 19'/J>) . 

Apparently, objective tests can be constructed to sample the domain of 
rules and thereby reliably measure how well students understand them. How- 
ever, there appears to be some question as to whether a proficiency in the 
mechanics of grammar assures a proficiency in written composition (Coffman, 
1969). 

The concern about how to measure writing competencies of American stu- 
dents was intensified in October 1975, with the publication of the Writing 
Assessment report by the National Assessment of Educational Progress 
(NAEP) . The findings demonstrated an apparent decline in the quality of 
writing by the nation's students (NAEP, 1975). 

Following the HAEP report on writing mechanics, the College Entrance 
Examination Board, which had recently relied on objective measures, 
announced that it would once again begin requiring a writing sample as a 
part of the test of writing ability of college applicants (Godshalk, 
Swineford, and Coffman, 1966). 

Coffman (1971) recognized that writing samples could be scored with a 
high degree of reliability. At the same time he cautioned that scoring is 
expensive, requiring large amounts of professional time and may, therefore, 
be impractical in large numbers. The question, then, is as follows: Does 
the scoring of a writing sample offer enough additional information to 
justify the professional time and cost involved? 

An underlying objective of assessment is to produce reliable and valid 
information from which instructional decisions can be made that will in- 
crease student achievement in the state. Instructional decisions formula- 
ted solely from objective measures of the mechanics of writing are based on 
the assumption that students can increase their proficiency in writing 
prose by increasing their proficiency in the mechanics of writing. A re- 
view of the literature indicates that several researchers have found a 
positive relationship between certain quantitative measures of writing 
mechanics and the quality of writing (Howerton, et al., 1977). 

Bloom, et al. (1977) examined the relationship between knowledge of 
mechanics and the ability to write compositions. Findings indicated that 
learning mechanics was only weakly related to the quality of compositions 
and then only certain students demonstrated this relationship. Past a cer- 
tain level , remedial students demonstrated no signif i cant relationships . 
These findings suggest the hypothesis that.311 students can not transfer 
their knowledge of the mechanics of writing and, at the same time, marshall 
ideas about a given topic and establish an organization and structure to 
convey meaning. 
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Chapter 2 



REVIEW OF LITERATURE 



This section of the study includes a review of the literature related 
to assessment of written cctnposition and a summary of a consultation with 
identified authorities in th™ field* TWo major computer searches were 
conducted in order to establish a working bibliography. One search was 
conducted by Educational Research Systems in Arlington, Virginia and the 
other was conducted through the ERIC System. Printouts were reviewed and 
related articles were ordered, in addition Dr. Willialm Lutz, consultant 
to the study, supplied a working bibliography. Authorities in the field of 
writing as identified by the Project Officer included Dr. Ina Mullis of 
National Assessment of Educational Progress, Dr. Alan Purves of the 
University of Illinois at Urbana and Dr. William Lutz of Rutgers 
University. These consultants were invited to meet with the project 
directors on the campus of Louisiana Tech University on March 20, 1981. 
The purposes of the meeting were: 1) to discuss the status of writing 
assessment, the methods used in assessment of written composition, and the 
problems associated with writing assessment, and 2) to p]an the summer 
conference with practitioners from local and state education agencies. 

S ummary of Related Research 

This study represents an attempt to determine the status of 
large-scale writing assessment in the nation. A new emphasis on assessment 
has been initiated by the Accountability Movement which began its sweep 
across the nation in the seventies. This new emphasis on assessment has 
nurtured changes in traditional testing and measurement theory, contributed 
to the development of criterion referenced tests, and promoted the 
specification of minimum competencies expected of students. 

Approximately 40 states are actively developing or using minimum 
competency tests. Writing assessment is a part of the minimum competency 
tests in many states (Education Carmission of States, 1979). The primary 
problem facing assessment decision-makers is the question of whether to use 
direct measurement techniques or indirect measurement techniques in the 
assessment of writing . Many authorities classify the two methods of 
evaluation of writing as "holistic" and "atomistic." Atomistic tests 
include the conventional multiple-choice lists of usage and mechanics, 
vocabulary tests as a measure of skill in discourse, measures of sentence 
length and complexity, readability formulas, and other measures of 
rhetorical conventions which are quantifiable. A user of atomistic tests 
assumes that the correlation between mastery of the identified feature and 
the art of discourse is close enough to permit predictions about skill in 
writing. Holistic tests include those evaluation techniques which depend 
upon the examination of a sample of writing. According to Cooper, holistic 
tests include holistic scoring techniques, analytic scoring techniques, and 
primary trait scoring techniques. However, other authorities criticize the 
grouping of analytic scoring and primary trait scoring with holistic. 
(Cooper, page 3). 

Atomistic tests traditionally have been short-answer tests which yield 
information about particular features of language. An example of such a 
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test is the "GED Writing skills Test." Skills measur d include spelling, 
punctuation, capitalization, grammar and usage, diction and style, sentence 
structure, and logic and organization. Hie examinee simply recognizes 
errors and makes a choice of effectiveness. Another example of an 
atomistic test is the "Written English Expression Placement Test." The 
first part of this test has twenty multiple-choice items on various aspects 
of punctuation and syntax in which the examinee identifies errors. On the 
second part, the examinee must identify which of three versions of a 
sentence is the best one. 

Test reviews in the Mental Measurement Yearbook do not indicate any 
indices for objective tests of writing skills which predict writing 
ability. The Sequential Test of Educational Progress (STEP) recently 
changed its name from "Writing" to "Mechanics in Writing" in response to 
the recent criticism of indirect measures of writing (Burros, 1980). 
Whether writing ability can validly be measured by multiple choice tests is 
a very old question. Richard Braddock insists that objective tests measure 
only proofreading skills (Braddock, 1979). On the other hand a classical 
ETS stuay of the 19t0's indicated that sixty-minute objective tests of 
writing skills can correlate above .70 with a reliable criterion of 
composition socres. However, the problem is compounded by the fact that 
over 200,000 students took the English Composition Test of the College 
State Boards in 1976-76 (Godshalk and Swinford, 1969). Regardless of the 
technique which is used, scoring such a large number of writing samples 
simply is not feasible. 

The Conference on College Composition and Communication (CCCC) has 
taken a definite position on direct measures of writing. First, CCCC has 
objected to the inclusion of objective usage tests on the grounds that such 
tests measure copyreading skills rather than the ability to use language. 
Further, CCCC felt that such tests discriminate against minority students 
in that the answers are different from their language patterns. Also, it 
was felt that secondary English teachers would teach to the test and 
neglect experiences in writing (CCCC, 1974). 

Again in 1978 CCCC passed a resolution concerning the use of direct 
measures. The resolution stipulated that no student should be given course 
credit, placed in a remedial writing course, exempted from a required 
writing course, or certified for competency without submitting a writing 
sample. The resolution called for further study of the entire issue of 
testing (CCCC, 1979). 

Direct measures for the evaluation of writing effectiveness are not 
without limitations. Braddock warns that the time limitation will inturn 
place limitations on students which will cause them to produce a writing 
sample under artificial circumstances (Braddock, 1979) . Sanders and 
Littlefield concur with this generalization by claiming timed impromptu 
conditions as well as assigned topics limit both motivation and quality 
(Sanders and Littlefield, 1979). Diederich insists that at least two 
writing samples are needed to allow students a chance to do their best 
(Diederich, 1960). Coffman described direct measurement techniques as 
expensive in that they require large amounts of professional time to rate a 
representative sample of papers (Coffman, 1966). The question remains that 
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while direct measurement may be desirable in large scale assessment it mav 
not be feasible. J 



At the present time many stab* departments of educations and large 
school systems are implementing assessment programs which include direct 
'measurement. The methods of scoring or rating the writing samples vary. A 
review of the literature indicates three major techniques which are 
discussed in this following section. 

Analytical Scoring 

Analytical scoring is based on the assumption that the quality of 
a writing sample can be judged by comparing the sample to a predetermined 
rubric. The rubric consists of specified characteristics which are 
determined to describe effective writing regardless of the mode. The 
characteristics are usually divided into categories of general merit and 
mechanics. 

/ ? e u finitive scale was developed by Diederich, French, and Carlton 

(Diederich, 1974). In order to define the characteristics of effective 
writing, the researchers selected 300 writing samples from a larger number 
of college freshmen students enrolled in English. One group of students 
was assigned the topic, "Who Should Go to College?" A second group of 
students was assigned the topic, "Why should Teenagers be Treater! as.- 
Adults?" From each group a sample of 100 papers was selected. To score 
the papers, sixty readers were selected from six different fields 
(Diederich, 1946). 

When the scores assigned by the various groups of readers were 
correlated with the scores assigned by every other group of readers, the 
resulting correlations were low (.31.) The scores of English teachers 
produced higher correlations v than those of any other qroup (.41) 
(Diederich, 1946). ^r.,...^ ^-r t „ J * v \» i 

To find out what school of thought tended to exist among the readers, 
the inter correlations were subjected to factor analysis. The factor 
analysis produced the following, five clusters: ideas, form, flavor, 
mechanics, and wording. Apparently some readers tended to sort papers 
based on how well the writer expressed "ideas," while other readers 
considered "mechanics" as the primary criteria, ?nd others based their 
judgments on "form" or "flavor" or "wording", while these five clusters 
had no practical application specified in the original purpose of the 
study, they did suggest characteristics which might be included in a 
grading system (Diederich, 1946). A result was the formulation of the 
analytical method. 

Diederich applied the analytical scoring method in grading English 
compositions written by students in three large high schools. Students 
wrote one paper per month on an assigned topic. The five factors 
previously identified were used as the scoring criteria and were assigned 
weights. After using the scale one year, the researchers and found that 
the scale divided itself into the two categories of general merit and 
mechanics. The category of general merit included ideas and organization 
which were assigned double weight. (See Scale on Next Page.) 
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DIEDERI eg? SCALE 



TOPIC 




READER 




PAPER 




• 


Low 




Middle 




High 


Ideas 


2 


4 


6 


8 


10 


Organization 


2 


4 


6 


8 


10 


Wording 


1 


2 


3 


4 


5 


r idVUL 


1 


o 


•3 
J 




D 


Usage 




2 


3 


4 


5 


Punctuation 




2 


3 


4 


5 


Spelling 




2 


3 


4 


5 


Handwriting 




2 


3 


4 


5 




10 


20 


30 


40 


50 " 




E 


D 


C 


B Sum 


A 



*Note that more emphasis is given to "ideas" and "Organization" than to the 
others. 



Each feature on the scale was described in detail with high-^nedium-low 
points identified and described along a scoring line fee each feature 
(Diederich.) 

I. GENERAL MERIT 

1 . Ideas 

HIGH . The student has given some thought to the topic and writes what he 
really thinks. He discusses each main point long enough to show clearly 
what he means. He supports each main point with arguments, examples, or 
details; he gives the reader 9ome reason for believing it. His points are 
clearly related to the topic and to the main idea or impression he is 
trying to convey. No necessary points are overlooked and there is no 
padding. 

MIDDLE . The paper gives the impression that the student does not really 
believe what he is writing or does not fully understand what it means. He 
tries to guess what the teacher wants and writes what he thinks will get 
by. He does not explain his points very clearly or make them oome alive to 
the reader. He writes what he thinks will sound good, not vrfiat he believes 
or knows. 
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LpW. It is either hard to tell what points the student is trying to make 
or else they are so silly that, if he had only stopped to think, he would 
have realized that they made no sense. He is only trying to get something 
down on paper. He does not explain his points; he only asserts them and 
then goes on to something else, or he repeats them in slightly different 
words. He dbes not bother to check his facts, and much of what he writes 
is obviously untrue. No one believes this sort of writing — not even the 
student who wrote it. 

2. Organization 

HIGH . The paper starts at a good point, has a sense of movement, gets 
somewhere , and then stops. The paper has an underlying plan that the 
reader can follow; he is never in doubt as to where he is or where he is 
going. Sometimes there is a little twist near the end that makes the paper 
come out in a way that the reader does not expect, but it seems quite logi- 
cal. Main points are treated at greatest length or with greatest emphasis, 
others in proportion to their importance. 

MIDDLE . The organization of this paper is standrrd and conventional. 
There is usually a one-paragraph introduction , three main points each 
treated in one paragraph, and a conclusion that often seems tacked on or 
forced. Some trivial points are treated in greater detail than important 
points, and there is usually sane dead wood that might better be cut out. 

DCW . ThiF paper starts anywhere and never gets anywhere. The main points 
are not clearly separated from one another, and they come in a random order 
— as though the student had not given any thought to what he intended to 
say before he started to write. The paper seems to start in one direction, 
then another, then another, until the reader is lost. 

3. Wording 

HIGH . The writer uses a sprinkling of uncommon words or of familiar words 
in an uncommon setting. He shows an interest in words and in putting them 
together in slightly unusual ways. Some of his experiments with words may 
not quite come off, but this is such a promising trait in a young writer 
that a few mistakes may be forgiven. For the most part, he uses words 
correctly, but he also uses them with imagination. 

MIDDLE . The writer is addicted to tired old phrases and hackneyed 
expressions. If you left a blank in one of his sentences, almost anyone 
could guess what word he would use at that point. He does not stop to 
think how to say something; he just says it in the same way as everyone 
else. A writer may also get a middle rating on this quality if he overdoes 
his experiments with uncommon words; if he always uses a big word when a 
little word would serve his purpose better. 

DCW . This writer uses words so carelessly and inexactly that he gets far 
too many wrong. These are not intentional experiments with words in which 
failure may be forgiven; they represent groping for words and using them 
without regard to their fitness. A paper written in a childish vocabulary 
may also get a low rating on this quality, even if no word is clearly 
wrong. 
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4. Flavor 



HIGH. The writing sounds like a person, not a committee. The writer seems 
quite sincere and candid, and he writes about something he knows, often 
from personal experience. You could not mistake this writing for the 
writing of anyone else. Although the writer may assume different roles in 
different papers, he does not put on airs. He is brave enough to reveal 
himself just as he is. 

* MIDDLE- The writer usually tries to appear better or wiser than he really 
is. He tends to write lofty sentiments and broad generalities. He does 
not put in the little homely details that show that he knows what he is 
talking about. His writing tries to sound impressive. Sometimes it is 
impersonal and correct but colorless, without personal feeling or 
imagination. 

ICW. The writer reveals himself well enough but without meaning to. His 
thoughts and feelings are those of an uneducated person who does not 
realize how bad they sound. His way of expressing himself differs from 
standard English, but it is not his personal style; it is the way 
uneducated people talk in his neighborhood. Sometimes the unconscious 
revelation is so touching that we are tempted to rate it high on flavor, 
but it deserves a high rating only if the effect is intended. 



II MECHANICS 

5. Usage, Sentence Structure 

HIGH. There are no vulgar or "illiterate" errors in usage by present 
standards of informal written English, and there are very few errors in 
points that have been discussed in class. The sentence structure is 
usually correct, even in varied and complicated sentence patterns. 

MIDDLE. There are a few serious errors in usage and several in points that 
have been discussed in class but not enough to obscure meaning. The 
sentence structure is usually correct in familiar sentence patterns but 
there are occasional errors in complicated patterns; errors in parallelism, 
subordination, consistency of tenses, reference of pronouns, etc. 

IAW. There are so many serious errors in usage and sentence structure that 
the paper is hard to understand. 

6. Punctuation, Capitals, Abbreviations, Numbers 

HIGH . There are no serious violations of rules that have been taught — 
except slips of the pen. Note, however, that modern editors do not require 
commas after short introductory clauses, around nonrestrictive clauses, or 
between short coordinate clauses unless their omission leads to anbiguity 
or makes the sentence hard to read. Contractions are acceptable — often 
desirable. 
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MIDDLE. There are several violations of rules that have been taught — as 
many as usually occur in the average paper. Counts of such errors in high, 
middle, and low papers at various ages and socioeconomic levels would be 
desirable in order to establish standards. 

I£W. Basic punctuation is emitted or haphazard, resulting in fragments, 
run-on sentences, etc. 

7. Spelling 

HIGH . Description oi spelling levels are most often used in grading test 
papers written in class. Since there is insufficient time to make full use 
of the dictionary, spelling standards should be more lenient than for 
papers written at home. The high paper (at ages 14-16) usually has not 
more than five misspellings, and these occur in words that are hard to 
spell. The spelling is consistent; words are not spelled correctly in one 
sentence and misspelled in another — unless the misspelling appears to be 
a slip of the pen. If a poor paper has no misspellings, it gets a high 
rating on spelling, even if no difficult words are used. 

MIDDLE . There are several spelling errors in hard words and a few 
violations of basic spelling rules, but no more than one finds in the 
average paper. Spelling standards differ so sharply fi_om grade to grade 
and from one socioeconomic level to another that each school would do well 
to make a distribution of spelling errors per hundred words (at least for 
test papers written in class) and relate its ratings to this distribution. 

JXW. There are so many spelling errors that they interfere with compre- 
hension. 

8. Handwriting, Neatness 

HIGH . The handwriting is clear, attractive, and well spaced, and the rules 
of manuscript form have been observed. 

MIDDLE . The handwriting is average in legibility and attractiveness. 
There may be a few violations of rules for manuscript form if there is 
evidence of some care for the appearance of the page. 

JXW. The paper is sloppy in appearance and difficult to read. It may be 
excellent in other respects and still get a low rating on this quality. 



10 

16 



Readers learn to use the scale by studying the descriptions of high, 
mid and low values for each feature. Then, they use the scale to score 
samples of student writing. Once the scale is used by readers, they 
discuss the results until they have developed a systematic way of thinking 
about the papers. In order to attain some reliability, both in terms of 
the objectivity with which a paper is graded and the variation in the 
quality of the paper from time to time, Diederick offers the following 
guidelines: 

1 . Papers must be judged in accordance with a specified 
written criteria. Each criteria should be weighted. 

2. Each paper must be graded independently by two readers. 

Diederick* s studies established the analytical scoring method as a 
reliable measure of writing ability and offered procedures to follow in 
developing a scale. Procedures, although simple to follow, are 
time-consuming. The first step is to collect large amounts of writing, 
both professional writing and student writing. The second step is to 
analyze the writing carefully so as to identify the prominent features 
which contribute to an effective writing sample. As the reading continues, 
the researchers discuss these features. Gradually, the features are shaped 
and modified until they have become comprehensive enough to describe a 
piece of writing but short enough to be manageable as a scoring guide. 1 

Once the list of features has been determined, the next step is to 
describe in writing in nontechnical language what is considered to be high, 
mid, and low qualities of each feature. Cooper describes this process as 
helpful in "anchoring" the points along a scoring line (Cooper, 1977). 

Finally, the completed guide is implemented to train readers who use 
the guide to score writing samples. Reliability checks are made to 
determine inter and intra reader agreement. This step is essential in 
research or curriculum evaluation studies. 

The advantages of the analytical scoring method have been described by 
a number of researchers including Cooper (1977), Smith (1979), Pitts 
(1979), and Winter (1979). All agree that the method is quick, efficient, 
reliable, and yields diagnostic information. However, Cooper and Odell 
cite as a disadvantage the fact that the criteria for rating is^ndt derived 
from a particular writing task or stimulus (Cooper and Odell, 1978). 

Holistic Scoring 

Holistic scoring of a writing sample is based on the assumption that 
the effectiveness of a piece of writing can be judged by readers 1 
impressions of the writing as a whole. In scoring a writing sample using 
holistic scoring, an inductive procedure is followed. After the writing 
samples are collected the readers sort the samples into four stacks. The 
quality of the essay is judged only in relation to the other essays in the 
group rather than to a pre-determined rubric (Mellon, 1975). Once the 
papers are sorted, the readers then identify variables which appear to 
distinguish better writing from poorer writing. 
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The holistic procedure was developed by Godshalk, Swineford, and 
Coffman at Educational Testing Service (ETC). The scoring method was the 
result of a large-scale investigation which was designed to validate the 
use of the writing sample as a way to directly measure writing ability. 
The procedure assumes that the factors that make up writing are so closely 
related that they cannot be separated (Los Angeles County) . The procedure 
has been extensively researched over the past twenty years, particularly by 
ETS in connection with the writing sample used in the College Board 
examinations (Godshalk, Swinefold, and Coffman, 1966). 

In order to develop the holistic procedures, Godshalk and his 
associates collected 646 papers written on three different essay topics as 
follows: "Pen Pal," "Imagine," and "Teenager." These papers were read by 
twenty-five experienced readers who scored each paper on a three-point 
scale. Each paper was given five readings, an analysis of variance was 
then applied to the scores. Results of the study validated the use of a 
writing sample as a measure of writing ability. When the writing scores 
were combined with objective tests results an estimated, reader reliability 
of .92 was established. However, scores tended to use the mid-point of the 
scale when in doubt about the quality of a paper. 1 

The procedure was field-tested with a four-point scale by Godshalk and 
his collegues in 1966 in an effort to increase reader reliability by 
forcing the score away from the middle. In the field test, 533 papers were 
written on two of the original topics. The papers were read by 145 
readers. Each paper was read five times by readers using a four-point 
scale and five times by readers using a three-point scale. The scores were 
then subjected to an analysis of variance. Findings demonstrated that 
scorer reliabilities, test validity, as well as other "factors, tended to be 
more efficient on a four-point scale than on a three-point scale. 

The holistic scoring procedure has been adapted to the scoring of 
writing samples which are a part of the College Board's English Composition 
Test. The following scoring procedures have been implemented to score 
writing samples from the College Board Test: 

a. Each writing sample is scored independently on a four-point 
scale. 

b. In scoring the paper, each reader makes two judgments: 

(1) First, the reader decides if the paper merits placement in the 
"upper half" or the "lower half." 

(2) Second, the reader decides if the paper is good enough to rate 
a "4" or weak enough to rate a "1." 

c. The paper is read again by a second reader who follows the same 
procedures. 

d. The scores from the two readings are summed resulting in a total 
score which may range from one to eight. 
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e. If there is a discrepancy of two score-points, the score must be 
reconciled by a third reader (Lutz, 1981). 



Readers for the College Board undergo intensive training sessions in 
which the standards for scoring are set* Under the guidance of a Chief 
Reader, Readers score samples actually written for the test. The assigned 
topic is studied by the readers as they discuss and agree on the kind of 
writing that was required. Then papers are read and scored. Agreement 
tallies are made to determine that standards are followed. Because the 
readers are judging writing written on an assigned topic and because each 
paper is judged in relation to the other papers, scores are expected to 
fall into a pattern resembling a normal distribution. 

Cooper states that general impression marking (holistic) is closer to 
analytic scoring than it may appear. in analytic marking, a reader 
compares the writing to a predetermined, printed rubric which describes 
high, middle, and low writing. While in holistic marking, one does rot 
utilize a printed rubric, scorers do read and discuss sample papers until 
they do identify features or qualities which guide their judgment. 
Therefore, even though a rubric is not printed, a common scoring guide is 
understood among the readers. In analytical marking, it is not assumed 
that the pattern of scores resemble a normal curve. All of the papers or 
none of the papers may embrace the specified features (Cooper, 1977). 

P rimary Trait Scoring 

When National Assessment of Educational Progress (NAEP) set out to 
assess the educational attainment of skill in writing among four age groups 
(9, 13, and 17 year olds, and adults) on a national basis, a writing sample 
was required. Holistic scoring as practiced by ETS readers in the scoring 
of College Board writing samples was used in the first program. The 
scoring procedures were severely criticized by Ehglish educators in the 
nation (Mellon, 1975). 

Criticism of the national writing assessment was varied. The major 
criticism from the English teachers concerned the ranking procedures 
employed. According to Mellon, ranking essays yields very little 
information about a piece of writing other than that 9ome writing samples 
are better than others (Mellon, 1975). Mullis charged that the score 
points were almost impossible to describe. For, example, if one paper were 
scored a 2, and another a 4, one did not knowjitoat factors contributed to 
one paper being better than the other ^llis, 1980>. The most convincing 
argument against holistic scoring is presented by Lloyd Jones (Cooper and 
Odell, 1977). According to him, the assumption is made in holistic scoring 
that if a writer is effective in one mode of writing, that writer is 
effective in all modes. In response to this criticism, NAEP asked 
Lloyd-Jones and Carl H. Klaus to develop a system of scoring writing 
samples which defined precisely the writing mode being evaluated and to 
develop scoring guides to evaluate that mode. In developing the Primary 
Trait Scoring System, Lloyd-Jones and Klaus first chose a three-part 
discourse model which included explanatory persuasive and expressive 
writing modes. Next, they described each mode of writing in terms of the 
purpose of the discourse, the role of the writer, the effect on the 
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audience, and the content required by the writing task. After describing 
the discourse the researchers then developed writing exercises which would 
stimulate respondents to address the writing node as defined. Since the 
writing mode had been carefully defined, the writing exercise was thereby 
restricted. This restriction created new problems in item development for 
test makers." Chances were increased that the writing exercise would fall 
outside the experiences of the respondents. with no knowledge of the 
specified situation, writer's response would not be a valid indication of 
his ability to write effectively (Lloyd-Jones, 1977). 

Each exercise was field tested and the responses analyzed to determine 
if writers interpreted the exercise as testmakers intended. If writers did 
not respond as expected, the writing task was revised. The task was 
recategorized, rephrased, or completely changed. 

m 

Once the task was refined, a scoring guide was created for the 
exercise. A complete scoring guide consists of the following elements: 

1. The exercise itself. 

2. A description of the rhetorical trait of the writing. 

3. An interpretation of the exercise indicating how each element of 
the stimulus is expected to affect the respondent. 

4. An interpretation of how the situation of the exercise is related 
to the primary trait. 

5. The actual scoring guide vrfuch is a shorthand system to be used in 
reporting descriptions of writing. 

6. Sample papers which have been scored as representative of each 
score point. 

7. Discussions of why each sample paper was scored as it was. 

When a complete guide had been developed, it was given a feasibility 
check by Vfestinghouse Learning Corporation under the direction of Louise 
Diana. The observers used the scoring guide to rate papers which had 
previously been rated by the testmakers. The two scores were analyzed to 
determine if they were related. The reliability data were judged to be at 
least as good as data obtained in holistic scoring. 

The Primary Trait Scoring System was implemented by NAEP in the Second 
Writing Assessment (1974) and used again in the Third Writing Assessment 
(1979). Holistic scoring was also used to determine if a group of papers 
written in 1979 was better than a group written in 1974. For this purpose, 
the same items were used in 1974 and 1979. Primary Trait Scoring was used 
to provide specific information about particular rhetorical aspects of 
papers. Papers were also scored for cohesion, that is, the general ways 
words and ideas were linked together in writing so as to create a sense of 
wholeness. In addition to being rated for quality, papers for two items 
were analyzed in terms of their syntactic and "mechanical" features. Thus, 
NAEP ucilized a variety of scoring procedures in order to determine the 
status of writing in the nation. 



ERIC 



14 



Summary of Consultation with Identified Authorities 



On March 20, 1981, Dr. Alan Purves, Dr. William Lutz and Dr. Ina 
Mullis met on the Louisiana Tech campus with project directors McCready and 
Melton to discuss the current status of writing assessment in the United 
States and to plan a national conference on writing assessment to be held 
in *ew Orleans m the summer. On page is the agenda of topics addressed 
in the March 20 conference. 

A basic question that was addressed initially was "what is the role of 
large-scale assessment and can its purpose be achieved with an objective 
measure?" The consultants all agreed that the political impact of writing 
assessment is significant and that the impact of a writing sample is 
greater than that of the objective measure. The consensus was that the 
objective measure can yield valuable information but that the inclusion of 
the twenty minute essay adds a dimension to the assessment in terms of 
information yielded and political impact. Purves pointed out that in 
numerous other countries, writing samples are considered important because 
of political spinoffs. A13 agreed that in order to determ ^e one's writinq 
ability, we must have him write. 

For purposes of meeting a legislative mandate, a statistically sound 
sample is sufficient. However, to provide feedback at the classroom level, 
every paper would need to be scored, whether or not scoring every paper is 
feasible depends on the purpose of the assessment. If written composition 
is a priority item in the 1 education budget, than it is feasible to score 
every paper down to the individual school level. In fact, scoring every 
paper might tend to have more of an impact on instruction while at the same 
time providing information (data) to meet the need for a statewide 
generalization about writing. If every paper is not scored, impact is 
diminished because it takes a long time for information to filter down from 
the top. Everyone seemed to agree that the writing sample is desirable and 
that a sample would suffice if the resulting data is adequately wrung for 
information and if that information is reported in a usable form to local 
education agencies. 

Rebecca Christian, of the Louisiana Bureau of Accountability 
acknowledged the need to test every youngster because of the instructional 
implications. However, the cost of scoring is the source of the problem. 
She supported the scoring of a sample, and then making examples of 
responses available to teachers from their own classrooms. The state could 
make provisions for familiarizing teachers with the scoring procedure and 
give these teachers the opportunity to evaluate their own students' papers. 
Mrs. Christian also commented on the inherent difficulties of reporting the 
results of the primary trait system of scoring. As a result, Louisiana is 
simply not ge'zting the full benefit of scoring a writing sample. In her 
opinion, if the primary trait system of scoring is used in its present 
form, even scoring every paper may not yield the state enough information 
to justify the cost. 

The problem with a general writing assessment report to a state 
legislature is that there is a tendency of people looking at this data to 
say that the state has been engaged in the process for five years and yet 
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nothing is happening. Therefore student writing is not improving, if all 
papers are scored and the data is broken down into various categories, it 
becomes easier to show that students improved in one category but did not 
improve in another and that students in this system or school exhibit these 
strengths and weaknesses. Therefore if every student is assessed, a data 
base becomes' available which can be reassembled in a variety of ways. If 
the need is for a generalization for the whole state, that is available. 
However if the need is for reporting at the classroom level, data is 
available for that purpose. 

All the consultants recommended a writing sample and agreed that it 

has political fallout. All agreed that assessing every student is 

desirable but that a random sample would be effective also, if an effort is 
made to wring the data. 

The consultants for the March 20 conference were selected based on 
their individual expertise about and experience with the various techniques 
of scoring written composition. In order to meet the objectives of the 
meeting, the directors asked that Ina Mullis represent primary trait 
scoring,- that Bill Lutz represent holistic scoring, and that Alan Purves 
represent analytic scoring. 

Purves shared the research findings of a study done by some of his 
graduate students .using papers of 17 year-olds. The papers were scored 
using the Diederick) Scale, a holistic scale, and primary trait. In terms 
of cost, according to the study, there was not a significant difference 
between primary trait and holistic. Both were quick. The DiedericjQscale 
required three times as much time as either of the other two methods. 
However, there was a higher inter-rater reliability (.92) and intra-rater 
(.80) reliability with the Diedericg Scale. 

One of the topics discussed at some length by the consultants was what 
to do about scoring mechanics. NAEP makes a practice of pulling an 
additional sample to score for mechanics with one reading. In New Jersey, 
papers representing the various score points on the holistic scale are 
pulled and are scored descriptively for mechanics. At NAEP separate 
scorers are used to score mechanics. These scorers do not score for 
primary trait. One consultant strongly felt that scoring for mechanics 
reinforces the public misimpression that quality of writing is directly 
related to punctuation. His finding was that by scoring mechanics, we 
simply tend to feed that idea. He also asserted that it seems unfair to 
score a first draft for mechanics. 

All consultants agreed that American students lack a clear perception 
of what is involved in the process of revision. When given the opportunity 
to revise, students generally want to recopy or, if any revision is done, 
it is cosmetic in nature. 

One point of view which was projected where scoring of mechanics was 
concerned was that by singling out and scoring for various aspects of 
mechanics, a message is sent to teachers which might be interpreted as, "I 
need to teach more capitalization in my classes," rather than "I need to 
emphasize writing in my classes." The message which should be sent to 
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teachers in answer to the question, "What should I do to get students ready 
for the writing assessment?" is teach them to write by having them write. 

There was some discussion among the consultants concerning differences 
among the three methods of scoring: holistic, primary trait, and 
analytical . It was pointed out that the New Jersey program was designed to 
be as non- threatening as possible. As a result a decision was made not to 
develop a scoring guide. Through discussion the scorers move toward 
agreement and verbally describe each level of the paper. 

It was pointed out that the major difference between the holistic 
scoring and primary trait scoring is that in primary trait scoring, student 
writing is being measured using external criteria rather than being 
compared with one another. Also, the holistic approach tends to measure 
general fluency whereas primary trait is more concerned with the specified 
purpose of the writing. If general fluency were being measured for the 
national assessment, NAEP would consider it necessary to use some external 
criteria, it was pointed out that with holistic scoring the assumption is 
that fifty percent of those tested can write better than fifty percent, 
which is known before the assessment begins. Rank ordering can be used for 
measuring change. Assessment years can be compared to see which ones come 
out on top.i The feeling at NAEP is that holistic scoring is efficient for 
measuring change in general fluency, but in terms of comparing writing data 
with the re^t of the data of the national Assessment (mathematics, reading, 
etc.) the Primary Trait method is more efficient and more suitable. 

bne of the problems cited with holistic scoring is getting scorers to 
score a paper\ a "four." English teachers are particularly reluctant to 
give a paper a^ "four." Trainers have to be moved away from the idea that a 
four represents an "A." There is also a reluctance to assign the lowest 
score. The sixVpoint holistic scale is being used more now which seems to 
give the scorer^ a feeling of a wider spread of scores; this brings about 
the assigning of^more high and low score points. English teachers tend to 
be difficult to convince that if the sample is sufficiently large, 
statistically the^e will be a normal distribution of papers. 

■ • \ 

Criteria for ^selection of scorers was discussed at some length . All 
three consultants^ indicated that in their experiences, persons with 
backgrounds in English, specifically the teeming of composition, were used. 
All felt that one \ of the most important-' 7 variables in the training of 
scorers was background of experience. One of the main reasons that 
Louisiana selected the Primary Trait System of scoring was that since 
classroom teachers were to be used in scoring the assessment, the consensus 
was that teachers with relatively limited backgrounds could be used due to 
the tight structure of the scoring guides. One consultant indicated that 
in selecting prospective scorers, one has to be careful of the person who 
talks a good game but who has very deep-seated unmovable prejudices, some 
of which even the person himself is not aware of. In training scorers, 
ground rules must be set which everyone agrees to follow, regardless of 
whether he/she agrees or disagrees. 

Problems associated with development of writing tasks were discussed 
at some length. All agreed that identifying topics appropriate for boys as 
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well as girls, for city as well as rural youngsters, does present problems. 
Using teachers to generate topics does not solve the problem, since many 
topics which teachers think will work do not work in an assessment 
situation. Several variables were pointed out as being important in 
developing tasks. For example, is the topic one which will contribute to 
reader fatigue, or will the topic outrage, upset, or depress the reader. 
Some topic will stimulate only one type of response, a situation which can 
cause "brain numbness" on the part of the reader. One consultant suggested 
that topics which require the writer to assume and defend a particular 
position should be avoided, because the writer might not have a genuine 
position. If this position is not sincere, the writing becomes artificial 
and lifeless. NAEP is moving toward use of prompts which allow the writer 
to draw on personal experiences. 

Of all the modes of writing, narration is the most difficult to 
score. 

Audience identification varies in importance, depending upon the mode 
of writing. One consultant felt that identification of audience places 
artificial constraints on the writer. Students know the audience is 
make-bleieve and tend to write to the teacher anyway. Audience 
identification was defended on the grounds that once a student leaves 
school he no longer writes for a teacher, but rather for varying audiences. 
Therefore instructionally there should be emphasis on variation of audience 
in the classroom. 

The question of the training of scorers was also addressed. when 
holistic scoring is used, much depends on the chief reader who selects the 
samples to be used in training. Most often training begins by looking at a 
three. If Primary Trait is used, begin with a solid "two," and then move 
to a "three." One difficulty in using primary trait scoring is that there 
is often more difference between "two's" than between the best "two" and 
the worst "three." In training scorers to an analytical scale, begin by 
using holistic scoring for the first papers, then share with scorers the 
rationale underlying the scale before they look at any more papers. In 
analytical scoring some ground rules have to be established as to where to 
score for sentence fragments or punctuation. 

In training scorers to use the holistic method, the chief reader 
assumes responsibility for leading discussions with scorers to lead to 
agreement. Usually these discussions occur about three times during a 
scoring day. Also the chief reader moves among the scorers randomly 
picking up papers, scoring them and checking for agreement. The tables are 
constantly being monitored. Besides the chief readers, there are table 
leaders at each table. 

All the consultants felt that scoring a writing sample is feasible 
depending on its purpose. All agreed that it constitutes the only way to 
find out if students can write or not. Multiple choice items tend to 
measure editing skills more than writing skills. A problem with the 
writing sample is that student achievement is measured based on a first 
draft, one which has not been revised. All agreed that in order for 



9 

ERIC 



18 

24 



scoring to be feasible, it must be streamlined. Primary trait and holistic 
scoring are nore efficient, time wise, than analytical scoring, especially 
when the practice is used of pulling a smaller sample to score for 
mechanics. 

It was pointed out that the "State of the Art" of writing assessment 
is not clearly defined at this time but that much progress has been made 
over the past ten years. Everyone hoped that impending budget cuts would 
not cause \/riting assessment programs to disappear. 
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CHAPTER 3 



RESEARCH PROCEDURES AND FINDINGS 



This study included tv*> evaluation designs* One design was utilized 
to determine the relationship between an objective measure of writing 
ability and the use of a writing sample* The other design was used to 
determine the "State of the Art" of writing assessment in the nation. This 
section presents the research methodology employed to answer the research 
questions related to each of the two designs. As this study embraced nore 
than one design, the findings for each study are reported iimiediately 
following the description of the design. 

The Relationship Between Scores on Objective Tests and Scores on Writing 
Samples for Lpuisiana Students 

The touisiana State Assessment Program included written expression for 
£he first time in the 1979-80 school year. As a part of the writing 
assessment, (October 1979) a sample of writing was secured from students in 
grades four, eight and eleven. From the total population at each grade 
level, 2500 writing samples were selected. The writing samples to be 
scored were identified by means of a stratified random sampling technique 
based on racial-ethnic group, socio-economic level of the region and 
population density. All special education students were deleted from the 
sample. 

The Bureau of Accountability in the Louisiana State Department of 
Education had decided to use an adaptation of the primary trait scoring 
system developed by NAEP to score the writing samples. The Louisiana Trait 
system was developed by National Testing Service based on minimum 
competencies determined by Louisiana teachers. Items and scoring guides 
were developed under the direction c>f Dr. Stella Lieu (NTS). 

In the fall of 1978, the Louisiana writing assessment instruments were 
field-tested under the direction NTS. At that time NTS provided 
training for Louisiana state dapartment personnel. Based on the 
observations and recatmendations of \NTS, items ana scoring guides were 
revised for the spring testing. \ 



In June of 1979 the directors of \ the project were contracted by the 



scorers* This training took place in a three-week institute described in 
the narrative below. < 

The Training Process for Scorers 

The training of the Institute participants began with an overview and 
history of the Ix>uisiana assessment program presented by staff members from 
the Bureau of Accountability in the State Department of Education. The 
objective of this phase of the training was to enable the participants to 
see their charge in the total context of the state assessment program* 



State Department of Education 




twenty-five classroom teachers as 
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An overview of the history of assessment was presented to the partici- 
pants, with particular emphasis on the role that criterion-referenced 
testing has played in recent years. This phase of training led to a review 
of the problems associated with the assessment of composition. Various 
methods were reviewed including the holistic approach, writing scales, com- 
puterized scoring and others. The trainers felt that participants should 
be made aware of the pitfalls and strengths of various approaches to 
assessment of writing so that the primary trait system could be viewed with 
a clearer perspective. 

With this background information, the trainers felt that the partici- 
pants were adequately prepared for an introduction to the Primary Trait 
System of scoring by Dr. Wayne Martin of the National Assessment of Educa- 
tional Progress. 

After an introduction to the Primary Trait system, participants were 
prepared for their introduction to the fourth grade items and scoring 
guides. Demonstration scoring was done by the instructors using test 
papers that demonstrated the various levels of the guides. Several sample 
papers for each score point were discussed in detail. Then fourth-grade 
papers were distributed to participants so that practice scoring could be 
carried out. Packets had been prepared so that instructors knew the score 
designation of the papers being distributed. Discussion always followed 
each scoring. Then several sets of test papers demonstrating various 
score-point designations were distributed so that participants could learn 
to distinguish between papers of varying quality. After completing prac- 
tice scoring for each item at the fourth-grade level, the process was 
completed for items at grades eight and eleven. The rationale underlying 
this approach was that all participants should be familiar with the items 
and scoring guides at all three levels so that a better perception of 
sequential development in writing could be attained. The perception of the 
scorers must be similar from grade level to grade level if the total eval- 
uation is to be valid. The process required each participant to verify his 
rating with the minimum proficiencies and," therefore, facilitated a more 
thorough familiarization on the part of the participants with the state 
minimum standards in writing. 

The next phase of training consisted of practice scoring by grade 
level. The participants were divided into three groups, one for each of 
the levels of testing. Packets of five papers of random mixture of score 
points were prepared by the trainers. Participants scored each set of 
practice papers, which were discussed until each reader was in agreement 
with the trainer about the score point. This phase of training was con- 
tinued until scorers were scoring consistently. Reliability coefficients 
were determined daily. 

The third phase of training consisted of practice scoring by indivi- 
duals. This training simulated actual scoring procedures. Packets of ten 
booklets were prepared by trainers. After the participants scored the 
first packet of ten papers, the trainers realized that ten papers are too 
many tr be scored in the training process. The discussion following scor- 
ing appeared to be crucial to reaching agreement. Therefore, the packets 
of ten were divided into packets of five. This procedure made it possible 
for all members of the group to score a packet and then to discuss the 
ratings. 
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Training continued with the participants scoring the same five tests 
coring their results, and discussing their differing opinions. The 

of^n^ 336 ° f ^ Sta ? S ° f training *P Mwfl to te Aching i SncensuI 
aLZ \ T ?■ t-ff 6 P 01 "^ MwayS ' ^ Stance of achieving a high 
degree of reliability was empnasized by the trainers. 

Reliability of Scorers 

r > v^ a ? le J!l! a: ?? r o 0b 5 eCti y eS ° f **** Instit "Jte was that scorers would use the 
Primary Trait System with a high degree of scorer reliability. Reliability 

S^nLf Ce * t6 u 6aCh ***** was ^"d b y each of the group! 

Trainers recorded each scorer's rating of a paper on each Drima™ Srf 
secondary trait and the percent of agreLent on ffi wa™ deSmS^Tab^ 

group* S e eSf tSit! ° f a9reem6nt f ° r ^ chec)cs for each 

.nh^S" 1 ^ "suiting data it appears that scorers for fourth grade 
achieved a higher scorer reliability than scorers at other grade levels. 

! k^/ 61 ^ 1 ? tended to var * from item to item. The highlr 
reliability might be explained by the fact that the writing task required 
students to write only a simple sentence at the fourth-grle Tevel^ Se 
scoring guide was carefully defined for scoring the primary trait, a 
factor which eliminated subjectivity. ' * ' 

Some of the writing stems stimulated students to write more. For 

Sf?Sf 1 '-. ltem f ° U n r CaUS6d to more than item one and the 

reliability was lower Apparently, the more students write, the more 
difficult it is to achieve a high degree of reliability of scored Thil 
conclusion is supported by the reliability per cents for eighth and 

SETS I L aPPearS that At is m difficult to define a primary 

trart at higher grade levels. Responses of students at these levelslend 
to be more varied and not as predictable as are responses of students at 

Si^aS^™' *" decisions «* be made! 

„ ^- R ?- ia ?- lity . WaS ^"Sistently higher on the secondary traits of 
capitalization and punctuation. These traits can be explicitly defined and 
are clearly specified in the minimum standards. 

tower reliability per cents were noted on syntax and spelling. This 
was caused largely by disagreement over the difference between "correct" 

5Ui2? 7* COrreCt l us *3 e - fact that usage was counted aTa 

syntactical error caused some confusion and led to disagreement. 

Description of Instrument 

Accountability personnel in the Louisiana State Department elected to 
use a scoring guide similar to guides developed by NAEP. Writinq tasks 
were selected fron the Louisiana Minimum Standards Documen t. Once a task 
SSJ^^t 6 *' ^J*? stimulus ms developed and a guide was developed to 
^! 1 1 1 tem ;- I Sf el0 P Bent of ^ems and guides were under the direction 
of Dr. Stella Liu (Wayne State University) who was contracted by NTS. 
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TABLE 1 



Primary Trait Secondary Traits 

Syntax Spelling Capitalization Punctuation 



4th 


1. 


90 


90 


85 


84 


91 


96 


94 ' 


89 




2. 


82 


95 


77 


80 


91 


89 


84 


87 




3. 


69 


81 


77 


83 


86 


82 


76 


84 




4. 


76 


73 


85 


93 


93 


71 


76 


81 



8th 


1. 


76 


77 


75 


82 


92 


61 


90 


92 


88 


80 




2. 


69 


73 


88 


85 


84 


71 


91 


81 


73 


67 




3. 


78 


77 


89 


74 


96 


81 


94 


80 


85 


84 



11th 1. 77 76 83 75 93 88 90 87 94 80 

2. 77 76 75 70 93 89 94 89 89 75 
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Scoring guides for a primary trait consisted of eight score points 
defined as follows. 



0 - No response 

1 - An attempt is made to respond to the request but fails to complete 

the specified task. This score indicates writing that is below 



2 - The writer completes the task with minimal elaboration. This 

score indicates a minimum proficiency as specified in the 
Louisiana Minimum Standards for Writing . 

3 - The writer completes the task with elaboration through added 

detail. Organization may be lacking. This score indicates 
writing that is above minimum but not excellent. 

4 - The writer completes the task with elaboration through expanded 

details and descriptive terms. Organization is readily 
discernible and mature. This score indicates excellence in 
writing. 

7 - Illegible. No further scoring. 

8 - Illiterate. No further scoring. 

9 - Misunderstands tne task or writes on a totally different subject. 

No further scoring. 



A scoring guide was developed for each item. Criteria for determining 
each score ppint were defined for each item. Scoring guides were used in 
scoring the writing responses secured in the field test. Problems were 
identified apd revisions were made in guides. The revised guides were then 
implemented in scoring the state assessment program. 

Mechanics were scored as secondary traits. The four secondary traits 
which were scored included spelling, syntax, capitalization, and 
punctuation. The scoring guide which were developed were based on the 
minimum standard document. The score point 2, which represented minimum 
proficiency, was assigned if the writer did not violate any convention 
specified for mastery at the designated grade level. If the writer 
violated a convention specified for mastery, a score of 2 was assigned. 
Above minimum was represented by a score of 3. The score was assigned if 
the writer attempted and used correctly any convention which was designated 
to be introduced but not mastered at a given grade level. A score of A was 
assigned if no errors were made. Examples of scoring guides are provided 
in the Appendix. 

1979 Statewide Writing Assessment 

The statewide writing test was administered in October 1979. In 
December 1979, scorers were reassembled for a one-day refresher session at 
the State Department of Education. state Department staff distributed 
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writing samples to be scored. Each scorer received a packet of one hundred 
papers, which they were to take home and score. When the entire packet was 
scored by the first scorer, the packet was mailed to a second scorer with 
the score sheets being sent to the State Department. when the second 
scorer had completed his/her work, any discrepancies were resolved bv a 
third scorer. J 

All score sheets were sent to Multi-Media in Brazille, Louisiana to be 
key-punched. All data were filed on a tape which was forwarded to the 
researchers at Louisiana Tech. The tape was not compatiable to the 
computer at Tech. Therefore, the data were entered into the Tech computer 
for analysis. 

Scorer Reliability 

To determine rater agreement, a chi square test for independence was 
applied to determine the association of the first rater's scores to the 
second rater's scores on a designated trait for a writing sample scored by 
both raters. A SAS computer program was used to analyze the data. 
Contingency tables were generated for each trait scored on each item. Chi 
square provided a measure of discrepancy between observed cell frequencies 
and those expected on the basis of independence. If the chi square was 
found to be significant at the 01 per cent level, the null hypothesis that 
no difference existed between the observed and expected values was 
rejected. The alternate hypothesis that the two values were associated was 
accepted. All values were found to be associated. 

From the chi square statistic, a contingency coefficient was 
generated. The contingency coefficient appeared to be the best correlation 
coefficient suited to the data. The contingency coefficient is a 
descriptive measure of the association between two nominal values and is 
independent of the ordering of the rows and columns on a contingency table. 
The minimum value is zero. The maximum value of the contingency 
coefficient .943 for the primary traits and .894 for the secondary traits. 
A summary of the results is shown in the Tables 2.1, 2.2, and 2.3 on pages 

Conclusions 

From the data analysis it is evident that there is little relationship 
between the first scorer's rating of traits and the second scorer's rating 
of the same traits. No variables contributing to this lack of relationship 
can be identified by the statistical analysis of the data. However, a 
study of the data analysis suggests certain variables that may contribute 
to the variance, in the following section, the variables are identified 
and recommendations are presented. 

Scoring Guides 

1. Scoring guides tended to be so general that too much interpreta- 
tion was left to the discretion of the scorers. 
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2. There was an apparent confusion between scorers in recognizing a 
writing response rated as a three and one rated as a four. 
Seemingly, scorers could not agree on a writing response that was 
simply above minimum (3) and one that was exemplary writing (4). 

3. Scorers could not agree on writing that was illegible or illi- 
terate. For seme scorers, if a sample was illegible, it was 
considered illiterate. 

4. Scorers had difficulty agreeing upon writing that was below 
minimum (1). 

Scoring Process 

Scorers all evaluated the writing response at home over the Christmas 
holidays. Each scorer established his/her own tours and scoring schedule. 
At no point did scorers come together for sessions to clarify 
interpretations. Therefore, the entire scoring process was uncontrolled. 

Items 

An examination of the items resulted in the following conclusions. 

1. Format of some of the items caused students to lose sight of the 
primary trait in their writing. The provision in the test format 
of too many lines caused students to write to fill those lines 
rather than to address the trait. 

2. Oi fourth grade, Item 1, the direction "describe as much as you 
can about the picture" violates testing and measurement 
guidelines as well as the scoring guide. The scoring guide was 
structured to evaluate an item describing the location of persons. 
When students wrote as much as they could about the picture, they 
lost sight of location. 

3. The eighth grade Item 2 posed two conflicting issues for the 
student to address, one on smoking and one on integrity. Students 
had difficulties in organization because of the nature of the 
item. 

4. The eleventh grade Item 2 presented alternatives for the student 
to select and write a persuasive response. The nature of the 
alternatives varied so much that one scoring guide could not be 
adapted to all of them in the same way. 

Training Process 

There was too great a lapse of time between the training process and 
the actual scoring. 

Recommendations 

1. The following recommendations are suggested for revision in the 
scoring guides: 
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a. Each rating on both primary and secondary trait scoring guides 
must be clearly defined. In the appendix is a recommended 
scoring guide for each secondary trait at each grade level. 
The rating of two has been defined as writing which adheres to 

. all conventions designated for mastery by the respective grade 
level in the minimum standards document (i.e., those items 
designated with three stars). Oie rating of three has been 
defined \as writing which adheres to all conventions which have 
been introduced and are on-going (i.e., those items that are 
designated with two stars). The gating of one is simply 
defined as writing which does not meet the qualifications of a 
tuo. i, ^ 

b. A scoring guide for syntW has been designed for grade four. 
Separating the scoring ofi syntax from the scoring of spelling 
will remove the need for interpretations by the scorer. Also, 
scoria for syntax at fourth grad<* makes the scoring the same 
for all three grade levels. 

c. On the, primary trait scoring guides the fallowing recommenda- 
tions ar£ made: 

1) Eliminate the rating of four. 

\ ' \ \ ' ■ I 

2) Collapse the ^ratings of seven ^and eight (illegible and 
\ illitpr&te) . \ i 

\ V \ \ \ 

2. The following reaamiendations are suggested for revisions in the 
\ scoring process. ^ \ ' 

! . \ ' \ 

a. Scoring must be tightly cpntrolled. Scorers must score as a 
group under the direction qf trainers. Scoring guides must be 
strictly followed. \ A scoring schedule should be^outlin^d by 
which scorers wculd systematically score a set 1 of papers, 
exchange with a second scorer, with a third scores reconciling 
discrepancies^ Opportunities should bejprovided for scorers 
to regularly \ come together \ for conferences. Conference 
periods appear to be essential to consistent interpretations 
of the scoring guides. 

3. The training process 1 should immediately preceed the scoring of the 
samples or be a part of the scoring of the papers. 

Revisions in the Louisiana Scoring Process 

The Teacher Education Department in the College of Education at 
Louisiana Tech University was contracted to sponsor the scoring and 
analysis of writing samples for the Bureau of Accountability, Department of 
Research and Development, Louisiana State Department of Education. The 
writing samples were secured from a sampling of students in grades 4, 8, 
and 11 who responded to writing tasks developed in June 1980. 
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The writing tasks were developed by a oamiittee of classroom teachers 
who had participated in scoring the 1979-1980 Assessment. At each grade 
level tested, four writing tasks were developed for each objective tested. 
The writing tasks had been formatted into four test forms at each grade 
level. Each test form was field-tested on a sample of 100 students. The 
nunber in the sample population was limited in order to facilitate scoring 
of the test. 

It is difficult for the classroom teachers who score the writing 
sample to be released from the classroom, with this sample, teachers would 
be required to score 400 tests. As each test must be scored twice, the 
number to be scored was actually 800. 

The sampling was conducted by the Bureau of Accountability; therefore 
the principal investigators had no control of the sampling techniques or 
distribution and administration of tests. 
« 

The Problem 

The responsibilities of the Teacher Education Department at Louisiana 
Tech included the following: 

1) Direct the scoring of the writing samples 

2) Analyze the data resulting from the writing sample 

3) Furnish reports relating to item analysis and scorer reliability 

The procedures followed by the principal investigators are detailed in this 
report. Findings resulting from the data analysis are presented with 
conclusions and recommendations. 

Scoring the Writing Sample 

The findings from the 1979-1980 Writing Assessment suggested several 
changes that needed to be made in the scoring process. In an effort to 
improve scorer reliability, the principal investigators designed a form of 
"specialized-team" scoring. It was the opinion of the investigators that 
if scorers specialized in one area that they would increase scorer 
reliability because they would not have to consider but two scoring guides. 
Speciality teams were organized as follows: 

1 . One person to score Primary Trait 

2. One person to score Spelling and Syntax 

3. One person to score Capitalization and Punctuation 

One person was assigned to score spelling and syntax because of the 
close relationship of the two traits. In order to get a valid measure of a 
students proficiency with a given trait, it is essential that an error be 
counted consistently the same way each time. It syntax and spelling are 
scored separately, there is a tendency to count a wrong word choice both as 
a syntactical error and as a spelling error. With the same person scoring 
both traits, the scorer tends to score errors consistently in the same 
way. 
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Punctuation and Capitalization were grouped together for similar 
reasons. Correct terminal punctuation signals correct use of capital 
letters. Scoring guides relating to decisions often require that 
consideration be given to both capitalization and punctuation. By grouping 
the two together, it was hoped that scorer reliability would be improved. 

In order to facilitate the scoring of secondary traits, the scoring 
guides for each trait were revised. All conventions which were identified 
in the State Minimum Standards Document as being "minimum" were listed for 
a score of "2". All conventions identified in the State Document as being 
introduced arid maintained were listed for a score of "3". (Copy in the 
Appendix) The fact that conventions which should be adhered to by writers 
at each grade level were specified for scorers, removed the necessity of 
reference to the Document by the scorer. In this way, the principal 
investigators hoped to improve the scoring of secondary traits. 

Two scoring teams were assigned to each grade level. One resolver and 
one clerk was assigned to each grade level. Test booklets were divided 
into packets of eight. Each scorer on each team was given a packet to 
score. The scoring guide shown on the next page was used to record the 
scores. As soon as all three scorers on a team scored a packet, the 
packets were given to the clerk who removed the first score sheet and 
placed in the packet the score sheet fo£-the second scoring. A copy of the 
second scoring sheet is shown on pag^^x Scoring teams then scored each 
packet a second time. When the second scoring was completed, the clerk 
placed the first score sheet beside the second score sheet and circled in 
red all scores that did not agree. The reconciler then considered each 
score where there was a disagreement and resolved it by deciding on one of 
the specified scores. *~~ 

Findings 

In order to determine scorer reliability, a chi-square test for 
independence was applied to determine the association of the first scorer's 
ratings on a designated trait for a writing sample scored by both scorers. 
An ASA computer program was used to analyze the data. Contingency tables 
were generated for each trait scored on each item. Chi square provided a 
measure of discrepancy between observed cell frequencies and those expected 
on the basis of independence . If the chi square was found to be 
significant at the .01 percent level, the null hypothesis that no 
difference existed between the observed and expected values was rejected. 
The alternate hypothesis that the two values were associated was accepted. 
All values were found to be associated. 

Fran the chi square statistic, a contingency coefficient was 
generated. The contingency coefficient appeared to be the best correlation 
coefficient suited to the data. The contingency coefficiency is a 
descriptive measure of the association between two nominal values and is 
independent of the ordering of the rows and columns on a contingency table. 
The minimum value is zero. The maximum value of the contingency 
coefficient for the primary traits is ^81_ and for the secondary traits. 
The contingency coefficient determined for each pair of scorers for each 
trait is shown on the Tables 2.1 , 2.2 and 2.3. The contingency coefficient 
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TABLE 2,1 





GRADE POUR 








Scorer Reliability: 






Contingency 


Coefficient: 


First Scorer 


to becond 


Scorer 






i9ou 




1979 


Test Form 


A 


B 


L 


Last Year 


Primary Trait 










Item 1 


• / j 




.63 


.77 


Item 2 


• /o 


. 00 


OA 

• 80 


.80 


Item 3 


.61 


.72 


70 


.78 


Syntax 










Item 1 


A^ 


• oU 


.61 





Item 2 


• *4U 


. DD 


.59 




Item 3 


.50 


.65 


• OT 




Spelling 










Item 1 


• Uj 


.00 


.62 


.64 


Then 2 




£7 
• 0 / 


.69 


.60 


Item 3 


.53 


.72 


• DO 


.59 


Capitalization 










Item 1 


• DO 


.56 




• OU 


Item 2 


1A 


• Ol 


.78 


.58 


Itan 3 


.63 


50 


.56 


.60 


Punctuation 










Item 1 


.66 


.69 


.78 


.62 


Item 2 


.70 


.65 


.78 


.64 


Item 3 


.66 


.60 


.78 


.61 


Maximum Value 










Primary Trait 


.81 


.81 


.81 


.943 


Secondary Trait 


.81 


.81 


.81 


.984 
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TABLE 2.2 



GRADE EIGHT 
Scorer Reliability: 
Contingency Coefficient: First Scorer to Second Scorer 







1980 




1979 


Test Form 


A 


B 


C 


Last Year 


Primary Trait 
Item 1 
Item 2 
Item 3 


.45 
.57 
.61 


.77 
.82 
.80 


.79 
.79 
.68 


fid 
.81 
.81 


Syntax 
Item 1 
Item 2 
Item 3 


.64 
.62 
.72 


.67 
.64 
.80 


.72 
.82 
.81 


fin 
.59 
.47 


Spelling 
Item 1 
Item 2 
Item 3 


.44 

AA 
• 44 

.71 


.70 
.61 
.79 


.65 
.82 
.79 


.52 
.52 
.62 


Capitalization 
Item 1 
Item 2 
Item 3 


• **u 
.34 
.66 


• DO 

.46 

.70 


A C 

.45 
.74 
.76 


.51 
.51 
.53 


Punctuation 
Item 1 
Item 2 
Item 3 


.62 
.42 
.67 


.70 
.65 
.70 


.79 
.82 
.77 


.44 
.48 
.48 


Maximum Value 
Primary Trait 
Secondary Trait 


.81 
-.81 


.81 
.81 


.81 
.81 


.943 
.894 
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TABLE 2.3 > 



GRADE ELEVEN 
Scorer Reliability: 
Contingency Coefficient: First Scorer to Second Scorer 







1980 




1979 


Test Form 


A 


B 


c 


Last Ypar 


Primary Trait 
Item 1 
Item 2 


.56 
.72 


.81 
.80 


.75 
.73 


.78 
.78 


Syntax 
Item 1 
Item 2 


.57 
.62 


.64 
.78 


.67 
.66 


.61 
.57 


Spelling 
Item 1 
Item 2 


.53 
.59 


.52 
.75 


.45 
.55 


.59 
.52 


Capitalization 
Item 1 
Item 2 


.51 
.41 


.40 
.73 


.52 
.49 


.55 
.52 


Punctuation 
Item 1 
Item 2 


.29 
.55 


.39 
.73 


.51 
.43 


.52 
.51 


Maximum Value 
Primary Trait 
Secondary Trait 


.81 
.81 


.81 
.81 


.81 
.81 


, .943 
.894 
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is reported for each test form field-tested. The last column of each table 
reflects the contingency coefficient for the 1979-1980 scorers. 



Conclusions 

Scorers' at grades eight and eleven appear to have improved scorer 
agreement. -The improvement is particularly noted in the secondary traits, 
especially at grade eight. Scorers at grade eight appeared to have 
excellent rapport with each other and to have a commitment to the 
achievement of scoring agreement. 

The eighth-grade scorers came together throughout the day to review 
how they scored papers, to discuss points of disagreement, and to establish 
scoring guidelines. This discussion appears to contribute to scorer 
agreement. This is reflected in the fact that the contingency coefficients 
improved for each test form scored. 

An increased agreement was not as noticeable at grades four and 
eleven. Several unexpected problems emerged during the scoring process 
that may have contributed to this. These problems are discussed below. 

First of all, there was a "people" problem. The scoring process had 
been carefully planned and scorers were notified well in advance of the 
session. However, on the day scoring was to begin, a scorer at the 
fourth-grade level and at the eleventh-grade could not come. Two 
substitutes had to be secured at the last minute. The fact that the 
substitutes were not prepared to score and missed the training session 
contributed to the lack of scorer agreement for the teams on which they 
scored. Also, one scorer at the fourth-grade level tended to be more 
concerned with "keeping up" with the rest of the scorers rather than 
achieving agreement. The scorer became so distraught that she left before 
the task was completed. Another situation arose when an eleventh-grade 
scorer objected to scoring decisions which had been made concerning the 
relationship between handwriting and capitalization. It is essential that 
basic scoring guidelines be followed by all scorers. Another attitude that 
was noted for the eleventh-grade scorers wasthe feeling "do not worry cbout 
it - let the resolver take care of it." The attitude of the scoring team 
appears to be an important element in the achievement of scoring 
agreement. 

The second problem was the amount of training. The primary purpose of 
this scoring session was to score the items which were field-tested for the 
spring assessment. All scorers had received training the previous summer 
or had served as scorers in the 1979-1980 State assessment. Therefore, 
only two hours were scheduled for retraining. This was not enough time. 
More time may have been needed because the scoring guides for both primary 
traits and secondary traits had been revised. However, it was apparent 
that a sufficient amount of time must be allocated to the retraining of 
scorers so that scorers will think in the same way. Discussions throughout 
the scoring process contributed to scorers thinking alike, also. 

An analysis of the agreement of scorers aj the fourth-grade level 
seemed to indicate a need for a more extensive training session at this 
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level. Scorers appear to need help in recognizing errors in syntax* This 
secondary trait was added to fourth-grade in an effort to improve scorer 
agreement by removing the confusion between syntax and spelling. 
Fourth-grade scorers may need guidelines that are more detailed. 
Certainly, trainers must devote time to a concentrated training session 
with this group. 

The third factor which affected scorer agreement was the items 
themselves. All items and scoring guides had been developed the previous 
summer. Oie serious problem at eleventh-grade le,el was with a "persuasion 
item 11 , item 1. The item objective had been confused by the test developers 
with a description task and the scoring guide tended to score for 
description rather than persuasion. Therefore, agreement was difficult to 
reach. A similar situation was noted at eighth-grade level with the third 
item of Form C. Students had difficulty fulfilling the task and responses 
were harder to grade. The item must stimulate the student to address the 
specified task and the scoring guide must clearly describe each possible 
score. 

Opportunity for Students to Edit 

At grades eight and eleven two forms of the test were formatted so as 
to provide an opportunity for students to edit the writing sample. Space 
was provided for the student to rewrite the sample. The investigators 
counted each test which included a rewritten writing sample. At graae 
eight, of the 100 tests providing the opportunity for rewriting a sample, 
thirty-three (33) included a rewritten sample. At grade eleven, twenty- 
seven of the 100 tests included a written sample. 

Observations made by scorers indicate that students merely copied the 
original paragraph including all errors. In fact, errors on the rewritten 
sample were noted that were not observed on the original sample . 
Apparently, students did not utilize the opportunity to edit and rewrite to 
an advantage. 

Time Required for Scoring 

Scorers were scheduled to arrive at 1:00 p.m. on Thursday afternoon. 
Plans were to spend Thursday afternoon in retraining and Friday and 
Saturday from 8:00 to 4:30 in Scoring. 

Retraining was conducted as scheduled with scoring beginning on Friday 
morning. Scoring moved slowly Friday morning. Obviously, the retraining 
session was not sufficient time for scorers to score in agreement. Much, 
time was devoted to discussion, it was hoped that Form A could be scored 
by both teams in four hours. However, Ftorm A was only scored one time at 
the eighth grade level. At the eleventh grade level and at the fourth 
grade level, scorers had scored about 50 booklets for the second round. 
However, by 2:00 p.m. all groups had completed Form A and had begun Form B 
with eighth grade being the last to complete Ftorm A. By the end of Friday 
afternoon with all teams scoring to 5:30 and eighth grade scoring even 
later, 50 booklets of Form B had been scored twice. 
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4. ^ Ve 5 a J. s f ore , rs early Saturday rooming. Many felt pressured 

at being behind schedule. However, by 10:00 a.m. Form B was completed. At 
this time the principal investigators realized that 400 booklets could not 
be scored m two days. They recommended to the State Department Supervisor 
to delete one form from the required scoring. The Supervisor did agree, 
reluctantly, 1 to delete Form D from scoring, mis move tended to remove 
pressure from the scorers. Form C was completed by all groups by 5:30 p.m. 
with eighth-grade finishing last. F 

Conclusions 

Although two teams of three scorers with one clerk and one resolver, 
scored 150 test booklets in one day, it was a supreme effort with extreme 
pressure. This number of scorers could, with ease, score 100 booklets. 

Scoring is a tedious task requiring complete concentration. An 
individual can only attend such a task for a few hours and then he/she must 
hours °° nc entration. Scorers cannot score with reliability for long 

After a time, agreement breaks down. if an agency is sincerely 
interested m accu-rate scoring, then consideration must be given to an 
^ adequate number of scorers and adequate time to accomplish the task. 

As the eighth-grade group was under much pressure, a task analysis was 
conducted, it was noted that the eighth-grade test was composed of three 
items as compared to two items at eleventh-grade. In order to adjust 
scoring time, the investigators recommended that only two items be included 
on the eighth-grade test. This should balance the time required to' score 
the eighth-grade test with the time required to score eleventh and fourth. 

Summary 

If the adjustments as recommended are made in taring, it appears that 
t«j teams of three scorers, one resolver, and a clerk - a total of 8 people 
- can score efficiently 100 test booklets in eight hours. This would 
indicate that about 12.5 test booklets can be completely scored and 
resolved in an hour. 

If for any reason the number of scorers per team should be reduced, 
specialized scoring would in all probability be rendered ineffective. The 
only purpose of "Specialized Scoring" was to reduce the number of scoring 
guides to be considered by one individual. Scorers tended to like the 
procedure and indicated that it did make scoring easier. 

If the number of scorers were reduced then the investigators 
recommend that the original method of scoring be utilized with one scorer 
scoring all traits. 
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Relationship Between the Various Trait Scores 
on the Objective Test and the Writing Sample 



Items on the objective test measuring each domain of spelling, 
capitalization, punctuation, and language structure were determined. The 
number of items correct for each domain was determined for each student. 

Next, each trait score for each item on the writing sample measuring 
that trait was summed for each student. This provided a trait score for 
each student on the primary traits and on capitalization, punctuation, 
spelling, and syntax, (except at grade 4). Each student's trait score was 
paired with his corresponding domain score on the objective test. A 
Pearson's r was run to determine the relationship between those two 
scores . 

Findings 

The following table shows the correlation coefficient for each 
trait-domain score. 

THE RELATIONSHIP OF EACH DOMAIN ON THE WRITING SAMPLE 
TO A CORRESPONDING DOMAIN OBJECTIVE TEST 





4 


8 


11 


Syntax 




.37 


.46 


Spelling 


.20 


.27 


.26 


Capitalization 


.17 


.33 


.29 


Punctuation 


.12 


.19 


.17 


Total Test to Primary Trait 


.50 


.54 


.42 



Clearly, there was very little relationship between the trait scores 
and the corresponding domain scores as measured on the objective test. 
Apparently, the two tests are measuring two different functions. However, 
there appeared to be some relationship between the Primary Trait score 
received on a writing sample and the total score received on an objective 
test. 

The next question which was addressed in this study concerned the 
information evaluated on the two tests. Did the Scoring Guides for the 
writing samples evaluate the same skills as measured on the objective test? 
Each objective measured on the objective test was determined and compared 
to the respective scoring guide. Findings are reported below. 
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Fourth Grade Test 



Spelling . In spelling the objective test was structured to evaluate the 
student's ability to spell beginning consonant sounds, color words, 
numerals, and regular plurals. The writing sample was evaluated in terms 
of the high-frequency words used by the writer. One descriptive item 
stimulated the student to use color words. Other than these specifics, it 
was strictly by chance if the student used the other specified skills 
(plurals of nouns and number names) 

Capitalization. The objective test contained items which measured the 
student's ability to capitalize proper nouns; the pronoun, I; and the 
beginning of the sentence. The writing sample tended to only measure the 
student's ability to capitalize the beginning of the sentence. One item 
stimulated the student to write in the first person and use the pronoun I. 
Unless a student tended to name the characters in a stimulus picture, the 
student was not stimulated to use proper nouns. 

Punctuation . The objective test measured the student's ability to use end 
punctuation of period, exclamation point, and question mark. The writing 
sample only stimulated the student to use the period. 



Eighth Grade and Eleventh Grade Test 

Spelling . The objective test included items which evaluated the student's 
ability to spell the months, contractions, hyphenated compound numbers, 
plurals of nouns ending in s, past tense forms of verbs, and holidays. On 
the writing sample the students were responsible for specific spelling 
patterns. Only the words attempted by the student were evaluated on the 
writing sample. Some students wrote a lengthy sample which increased the 
probability that he would misspell a word. Other studetns wrote shorter 
samples, attempted fewer words, and made higher scores. There was a great 
variance in the words attempted by students on the writing samples. 

Capitalization . When the items on the objective test were analyzed, the 
items designed to measure capitalization tended to measure conventions 
related to letter writing, writing title, and other specialized uses of 
capital letters. None of the writing tasks stimulated students to attempt 
the conventions evaluated on the objective test. 

Punctuation . The objective tests evaluated the use of hyphens, the use of 
comma with items in a series, periods at the end of sentence and between 
dollars and cents. None of the writing tasks stimulated the use of these 
conventions. 

Syntax . Findings relating to syntax paralleled the findings relating to 
capitalization and punctuation. Conventions which were evaluated on the 
objective tests were not stimulated on the writing task. 
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Conclusions . The basic conclusion which can be made is that there was very 
little relationship between scores students received on a writing sample 
and scores students received on an objective test. Obviously, from the 
findings reported, a student's ability to write a minimumally acceptable 
writing sample can not be predicted from a knowledge of the student's score 
on the Louisiana state Assessment of Writing (objective test). No 
indication of the lack of relationship can be made from the correlation 
study. A content analysis was made of each objective test and each 
corresponding scoring guide. 'Ihe content on the two instruments were 
compared. Clearly, from the findings, the two instruments were not 
designed to measure the same things. Therefore, frcm this study the 
generalization can not be made as to whether or not a student's ability to 
write can be predicted frcm a score on an objective test of writinq 
skills. 3 
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The Relationship of Mean Scores for 
Different Socio/Econanic Levels of Students 



An evaluation design was outlined in the proposal to answer the 
following three questions: 

1 . Are the , mean scores on the objective test for each group of 
students in the design significantly different from the mean 
scores for every other group? 

2. Are the mean scores for primary trait on the writing sample for 
each group of students in the design significantly different from 
the mean scores for every other group? 

3. Is a profile analysis constructed for each group of students from 
data yielded from an evaluation of a writing sample similar to a 
profile analysis constructed for that group from data yielded from 
an objective test of writing? 

Circumstances beyond the control of the investigators prevented the 
completion of the research to answer these questions. In order to address 
these three questions, a unique computer program was required which was not 
available in the university computer center. 

Personnel changes in the computer center resulted in changes in 
responsibility so that the person with whom the investigators had been 
working was promoted. The new person was not familiar with the project nor 
with the multi-trait design. Although the investigators were successful in 
getting the necessary data loaded into the computer, a program could not be 
written by the personnel. Therefore, a programmer was needed. By this 
time it was late in the project year. Although budget categories could 
have been adjusted to accommodate this cost, the program could not have 
been developed and run by the closing date of the project. University and 
state policy mandates that all work be completed and billings filed in the 
Comptrollers office by the termination date of the project. Therefore, it 
was not possible to address these questions. 
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SURVEY OF NATIONAL TRENDS IN 
WRITING ASSESSMENT 

One of the purposes of this study was to determine the extent to which 
large- scale- writing assessment is being implemented in the nation. 

Specifically, the following questions were to be answered: 

( 1 ) How many states and large city school systems are attempting to 
assess writing? 

(2) If writing is being assessed, which measurement technique is 
being used: direct measurement, indirect measurements or both? 

(3) If direct measurement is being used, what scoring method is used? 

(4) How are assessment results reported and utilized? 

(5) Do decision makers consider direct measures essential to the 
assessment of writing? 

In order to answer these questions, the questionnaire shown on pages 
43 and 44 were mailed to each state department of education and large 
school systems (See Appendix page 83 for selected school systems) 

Findings and Results 

Responses were received from 42 state departments of education with 24 
of the SDE's indicating a writing assessment program. Table 3 on page 45 
provides descriptive data about each of the writing programs. Of the 24 
states claiming to have a writing assessment program, 22 claimed xo assess 
writing using a writing sample. Most of the states using a writing sample 
indicated that they used holistic scoring procedures with three states 
using primary trait techniques, one state using analytical, and three 
states using both holistic and analytic scoring. stater, using only 
objective measures included only two. 

Of the school systems responding, 20 school systems have a writing 
assessment program. Table 4 shown on page 47 summarizes the descriptive 
data for each program. Of the 20 school systems which indicated a writing 
assessment program, 17 use a writing sample. Methods of rating the sample 
tended to be more varied than those used by the SDE's. However, holistic 
scoring procedures were indicated to be used by seven systems with a 
combination of holistic and analytical used by three, and holistic and 
primary trait by one system. At least two systems claimed to use a writing 
scale and three systems claimed to use analytical scoring. 

The; data reveal that in most instances where a writing assessment has 
been implemented/ the decision resulted from policy rather than by mandate. 
At the state level, 12 states are responding to a mandate with only three 
school systems responding to a mandate. 
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When asked if an objective test was used, eleven states indicated that 
an objective measure was used as well as a writing sample, in the eleven 
states not using an objective test, a writing sample was used. Only two 
states used an objective test without a writing sample. In cases of -large 
school systems, twelve systems used an objective test. In eight systems 
where an objective was not used, a writing sample was used. Only two 
systems used an objective test without a writing sample. 

One of the features set forth by w. James Popham as being 
characteristic of a high quality minimum competency program is that the 
?o^ am ri h0uld assess defensibl e predetermined competencies (Popham, 
1381). The respondents were asked if minimum standards had been determined 
and if the writing sample measured the predetermined standards. As 
indicated on Table 3, fourteen states had minimum competencies and thirteen 
states indicated that the writing sample measured those competencies 
Table 4 indicates that fifteen districts have minimum standards and eleven 
districts indicate that the writing sample measured predetermined 
standards. Minimum Standards were developed in a variety of ways, with the 
representative committee being the most frequently used process (See 
Tables 5 and 6) 1 

One of the major difficulties in utilizing the writing sample is 
managing the scoring of the large numbers of samples which result when the 
entire population is tested. On the other hand, if only a sample of the 
population is tested, there can be no direct feedback to the individual 
students. Respondents were asked if they tested an entire population or 
only a sample, fables 7.. and. 8 report the results. Only- ten states and 
fourteen LEA's test the entire population, sampling size, in most cases, 
was less than 5,000 (13 SDE's and 5 LEA's). 

Another major problem which has appeared in the implementation of 
writing assessment programs is the development of writing tasks which are 
used in the assessment to prompt students to write. Assessments utilizing 
a writing sample are relatively new and as a result very few writing tasks 
are available in item banks. This is not the case where multiple choice 
items for objective tests are concerned. NAEP has developed a bank of 
writing tasks because most agencies are faced with the problem of 
generating effective ones. Respondents were asked who developed the 
writing tasks in their respective samples. As state and local agencies 
indicated on Table 9 and 10, most of them utilized a committee composed of 
state department personnel, university personnel, teachers, and 
administrators (12 SDE's and 14 LEA's). Contractors were used by six SDE's 
and four LEA's) NAEP items were used by 3 State Departments. 

When a student's ability to write is tested by means of an objective 
test, the majority of items are related to mechanics. (See Tables 11 and 
12) Therefore, respondents were asked if mechanics were scored on the 
writing sample. As shown on Tables 9 and 10, fourteen SDE's and fourteen 
LEA s indicated that mechanics were evaluated. However, when the fact is 
considered that most of the agencies employ holistic scoring, it becomes 
apparent that mechanics is evaluated in the total impression of the 
writing. Louisiana has made an effort to evaluate mechanics by applying 
the minimum standards as a criteria. That is, to be judged as minimally 
competent, a student can not make an error in the use of a convention 
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identified for mastery at a given grade level. Other states scoring by the 
primary trait system, identify percentage of errors by the process of 
counting words written and errors made. Both processes are tedious and 
time consuming. 

Since so much time, effort, and money is being expended in the 
assessment of writing, the agencies were asked how the information was 
reported and utilized. As shown on Table 13, most agencies report in terms 
of percentage correct or percentage of students demonstrating mastery. 
According to state agencies responding, the information is utilized by the 
LEA s and local schools (Table 14). On the other hand, LEA's report that 
the information is utilized mainly by the school and classroom. Only three 
LEA s report their local testing results to the state department and two 
LEA s report to parents. Apparently, the local education agency is 
expected to utilize the results. 
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^^^^y UNIVERSITY 

^^^^^^^B R.UBton. Louwiana 71272 





Teaser frfucorion 



College of Education 



Deat Colleague: 

The department of Tea ;her Education at Louisiana Tech University has 
been contracted by the National Institute of Education to research the "State of 
the Art' ' of large-scale assessment of written composition. As a part of that 
study, we must determine the status of large-scale writing assessment pro- 
grams currently being used by both local and state education agencies. There- 
fore, we are asking you to complete this questionnaire an i return it to us in the 
enclosed self-addressed envelope. 

From the respondents to this questionnaire, ten individuals will be 
selected to participate in an expense-paid conference on writing assessment to 
be held in Louisiana in the summer of 198 1 . Invitations v 'ill be issued to others 
to attend at their own expense. Consultants at this conference will include 
Dr. William Lutz of Rutgers University, Dr. Ina Mullis of National Assess- 
ment of Educational Progress, and Dr. Allan Purvis of the University of 
Illinois-Urbana. 

Your prompt response to this questionnaire would enable us to complete 
this important NIE assignment and would place your name among those to be 
considered for participation in the NIE conference on writing in the summer of 
1981. 



Sincerely, 
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AN EQUAL OPPORTUNITY UNIVERSITY 



PJote mark an X In tha box baalda aach raaponaa which baat describes the aaaeaamant program in your school ayatem. 



A, ASSESSMENT PROGRAM IN WRITTEN COMPOSITION 

1 Does your local education agency have a systemwide assessment program in written composition? 

"If you answered WO, do not complete the remainder 
O YES □ N0 0/ the questionnaire, but please return it to us. 

"If you answered YES, please continue. 



2. How was your assessment program in written composition initiated? 

□ mandated by state legislature □ result of action by local board of education. 

□ result of policy set by state board of education □ nther 



3 If your school system does have an assessment program in written composition, place an X in the box indicating the 
grade levels tested: 

□□□□□□□□□□ 
1 2 3. 4 56789 10 



□ 
11 



□ 
12 



4. Does your system have a published set of minimum standards in writing? 
□ YES □ NO 



5. How was the minimum standards document developed? 

□ by a committee of classroom teachers □ by a committee representative of all of the above 

□ by a committee of school administrators □ by state department personnel 

□ by a committee of university personnel □ by a contractor 

□ other 



6. How is information from writing assessment utilized? 

□ reported to local districts 

□ reported to individual schools 

□ reported to individual classrooms 



□ reported to the SDE 

□ other 



7. How are the results reported? (Respond to each section) 

□ by subject area tested □ by objective 

□ by domain ^ □ byitem 



□ for individual students 

□ for individual classrooms 

□ by grade level within each school 



□ percentage correct 

□ percentile 

□ stanine 



□ by grade level within the school system 

□ by grade level within the state 



□ standard score 

□ other 



B. OBJECTIVE MEASURES OF WRITTEN COMPOSITION 

8. Is an objective test used to measure writing ability? 



□ YES 



□ NO 
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"If you answered NO, do not complete this section, but 

go to section C. 
"If you answered YES, go to the next question. 



SO 



9. If an objective test is used, which of the following domains are measured at the designated grade levels? 





PRIMARY 




MIDDLE 




JUNIOR HIGH 




SENIOR HIGH 


□ 


Content 


□ 


Content 


□ 


Content 


□ 


Content 


□ 


Organization 


□ 


Organization 


□ 


Organization 


□ 


Organization 


□ 


Style 


□ 


Style 


□ 


Style 


□ 


Style 


r— i 

□ 


Spelling 


□ 


Spelling 


□ 


Spelling 


□ 


Spelling 


□ 


Punctuation. 


□ 


Punctuation 


□ 


Punctuation 


□ 


Punctuation 


□ 


Capitalization 


□ 


Capitalization 


□ 


Capitalization 


□ 


Capitalization 


□ 


Syntax 


□ 


Syntax 


□ 


Syntax 


□ 


Syntax 



10. If an objective test is used, do items measure specified minimum standards? 
□ YES □ NO 



C. WRITING SAMPLE AS MEASURES OK WRITTEN COMPOSITION 

11. Is a writing sample used to measure writing ability? 

**lfyou answered NO, do not complete this section, but 
□ YES □ NO go to question 23. 

**lfyou answered YES, go to the next question. 



12. Does the writing sample measure specified minimum standards? 
□ YES □ NO 



13. Who developed the items for the writing sample? 

□ State Department Personnel 

□ University Personnel 

□ Private Contractor 



□ Classroom Teachers 

□ LEA Administrators 

□ Other 



14. If a writing sample is utilized is the test administered to: 
□ entire population tested 



□ sample of population 



1 5. How many students are tested at each grade level with a writing sample? 



16. 



er|c 





PRIMARY 




MIDDLE 




JUNIOR HIGH 




SENIOR 


HIGH 


□ 


less than 5,000 


□ 


less than 5,000 


D 


less than 5,000 


□ 


less than 5,000 


□ 


5,000 - 10,000 


□ 


5,000 - 10,000 


□ 


5,000 - 10,000 


□ 


5,000 - 


10,000 


□ 


10,000 - 20,000 


□ 


10,000 - 20,000 


□ 


10,000 - 20,000 


□ 


10,000 - 


20,000 


□ 


20,000 - 30,000 


□ 


20,000 - 30,000 


□ 


20,000 - 30,000 


n 


20,000 - 


30,000 


□ 


30,000 - 40,000 


□ 


30,000 - 40,000 


□ 


30,000 - 40,000 


□ 


30,000 - 


40,000 


□ 


40,000 - 60,000 


□ 


40,000 - 60,000 


□ 


40,000 - 60,000 


□ 


40,000 - 


60,000 


□ 


more than 60,000 


□ 


more than 60,000 


□ 


more than 60,000 


□ 


more than 


60,000 



What types of writing are measured with a writing sample? 

□ narration □ persuasion 

□ exposition □ other 

□ description 



1 7. Is the writing sample evaluated for mechanics? 

If YES, which of the following mechanics are evaluated? 

□ spelling □ punctuation 

□ capitalization □ language structure 



□ YES 



□ NO 



□ usage 

□ handwriting 



5.1 



18. Are scorers trained to score the writing sample with a high degree of scorer agreement? 
□ YES □ NO 



19. Statistically, how is scorer agreement determined? 



20. 


Who scores the writing sample? 

□ classroom teachers 

□ graduate students 

□ test contractor 


□ 
□ 


state department employees 

other 


21. 


Who trains the scorers? 

□ State department personnel 

□ test contractors 


□ 
□ 


consultants 

other 


22. 


What system of scoring is used? 

□ holistic 

□ analytical 

n writina scales (specify) 


□ 
□ 


primary trait 
other 











23 Two schools of thought appear to be emerging concerning the assessment of writing. One school insists that the 
only true measure of writing skill is to have students w.ite. The other school insists that most important mental 
processes, including writing, can be measured well by objective items. Explain why you feel that a writing sample 
either js or is not essential in the assessment of writing. 
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Table 3 

States With a Writing Assessment 



State 


Action 

Initiating 

Assessment 


Grades 
Tested 


Have 

Minimum 

Standards? 

i 


Is an 

\ju jective 
Test Used? 


Does the 
Test Measure 

Minimum 
Standards? 


Is a 
Writing 
Sample 
Used? 


Does the 

Writing 

Sample 

Measure 

Minimum 

Standards? 


Method of 
Scoring 


AL 


Policy 


3,6,9 


Yes 


WO 


NO 


Yes 


Yes 


Holistic 


OB 


Mand. /Policy 


1-8.1 1 


xes 


Yes 


No 


Yes 


No 


Primary Trait 


CA 


Mandated 


3,6,12 




Yes 


No 


No 


No 




FL 


Mandated 


3,5,8,11 


Yoe 
Ico 


xes 


Yes 


Yes 


Yes 


Analytical 


HI 


*Ad. Decision 


4,8,11 


xes 


NO 


No 


Yes 


Yes 


Holistic 




Policy 


9 


ICO 


NO 


No 


Yes 


Yes 


Holistic 


ME 


Mand . /Pol icv 


4 8 11 

*t,o, 1 1 


NO 


No 


No 


Yes 


Yes 


Holistic 


MO 


Policy 


9-12 


Yes 


xes 


Yes 


Yes 


Yes 


Holistic 


MA 


Policy 


7.8.9,12 




KIrv 
NO 


No 


Yes 


Yes 


Hoi ./Analytic 


MI 


Mand. /Policy 


4,7,10 


Yes 

A N» 


t n 




Yes 


Yes 




MN 


*Ad. Decision 


4,8,11 




NO 


NO 


Yes 


No 


Primary Trait 


NV 


Mandated 


3,6,9-12 


ICO 


JO 


Yes 


Yes 


Yes 


Holistic 


NC 


Rec. 


11 


Yes 


No 


No 


Yes 


. Yes 


Hoi ./Analytic 


NH 


Policy 


5,9,12 


No 


No 


No 


Yes 


No 


Holistic 


NJ 


Mandated 


9 


Yes 


Yes 


Yes 


Yes 


Yes 


Holistic 


NM 


• 

i 


10 

I 


In process 

1 


No 

1 


No 1 
1 


Yes 

1 


No 

1 


Holistic 
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TABLE 3 (Cont.) 
States With a Writing Assessment 



State 


Action 

Tnifciafcino 

Assessment 


Tested 


rV"\ Of* al«nr« 

Have 

Minimum 

Standards? 


Is an 
Objective 
Test Used? 


Does the 
Test Measure 

Minimum 
Standards? 


Is a 
Writing 
Sample 
Used? 


Does the 

Writing 

Sample 

Measure 

Minimum 

Standards? 


Method of 
Scoring 


OH 


Mandated 


R 19 


Yes 


Yes 


Yes 


Yes 


No 


Holistic 


OR 


*Ad • Decision 


4 7.11 


Vac 


Yes 


No 


Yes 


No 


Holistic 


PA 


Mandated 


5 R 11 


NO 


Yes 


No 


No 


No 




RI 


Pol icy 


4 r m 


In Process 


Yes 


Yes 


Yes 


Yes 


Holistic 


sc 


Mandated 


6,8,11 


Yes 


No 


No 


Yes 


Yes 


Hoi. Analyt 




Mandated 


3,5,9 


Yes 


Yes 


Yes 


Yes 


Yes 


Holistic 


WY 


*Ad. Decision 


6,9 


No 


No 


No 


Yes 


No 


Holistic 


IA 


Mandated 


3,7,10 


Yes 


Yes 


Yes 


Yes 


Yes 


Primary Trait 



*Ad. - Administrative 



Ou 



TABLE 4 

School Systems with a Writing Assessment 



School 
District 


Action 
Initiating 
As sp^smpn t 


Grades 

1COLCU 


Do Districts 

Have 

Minimum 

oCculQciLUo; 


Is an 
Objective 
i^esu usea r 


Does the 
Test Measure 

Minimum 
stand ardsr 


Is a 
Writing 
Sample 
Used? 


Does the 

Writing 

Sample 

Measure 

Minimum 

Standards? 


Method oi: 
Scoring 


AR 

Little Rock 


Policy 


1-11 


NO 


Yes 


No 


No 






A7 

Phoenix 


*AcL Decision 


9-12 


Yes 


Yes 


Yes 


Yes 


Yes 


Analytical 


Monterey 


Policy 


1-12 


Yes 


No 


No 


Yes 


Yes 


Holistic 


PL. 

55 Tallahassee 


Policy 


1-8 


Yes 


Yes 


Yes 


Yes 


Yes 




Atlanta 


Policy 


1-12 


Yes 


Yes 


Yes 


No 


No 




TA. 

Des Moines 


*AcL Decision 


9 


No 


No 


No 


Yes 


No 


Hol./Analy. 


IL 
Chicago 


Policy 


9-12 


Yes 


No 


No 


Yes 


Yes 




KS, 
wicnita 


*Ad* Decision 


K-12 


Yes 


Yes 


No 


Yes 


No 


Holistic 


MA, 

Boston 


Mandated 


2,5,8 


Yes 


No 


No 


Yes 


Yes 


Holistic 


MD, 

Baltimore 


Policy 


1-9 


Yes 


Yes 


Yes 


Yes 


Yes 


Analytical 



*Ad. dec. - Administrative decision 
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TABLE 4 (Ctont.) 
School Systems With a Writing Assessment 

0 





Action 

Initiating 

Assessment 




Do Districts 




Does the 


Is a 


Does the 
Writing 




School 
District 


Grades 
Tested 


Have 

Minimum 

Standards? 


Is an 
Objective 
Test Used? 


Test Measure 

Minimum 
Standards? 


Writing 

Sample 

Used? 


Sample 
Measure 

Mi ni mi im 

Standards? 


wecnoa or 
Scoring 


MI. 


















Detroit 


Policy 


10-12 


Yes 


Yes 


Yes 


Yes 


Yes 


Holistic 


NC, 
v Raleigh 


*Ad. Decision 


1-12 


Yes 


Yes 


Yes 


Yes 


Yes 


iuckrment 


nm, 


















Albuquerque 


*Ad. Decision 


10-12 


Yes 


yes - 4,6,9 


No 


Yes 


Yes 




NM, 








to - 10 






|» Santa Fe 


Policy 


7-12 


No 


No 


No 


Yes 


No 


Hoi. /Anal v. 


NY, 


















New York 


Policy 


8,11 


Yes 


No 


No 


Yes 


No 


Holistic 


OR, 


















Portland 




3-9 


Yes 


Yes 


Yes 


No 


No 




TX, 


















Austin 


Mandated 


3,9 


Yes 


Yes 


Yes 


Yes 


Yes 


Holistic 


WI, 


















Madison 


Mandated 


5,8,11 


No 


Yes 


No 


ICO 




HOI •/rr 


WA, 
















/ 


Seattle 


Policy 


3,6,9, 
11 


Yes 


No 


NO 


Yes 


Yes 


W. Scale 


WY, 
















Laramie 




6,9 


No 


No 


NO 


Yes 


No 


Holistic 



*Ad. dec. - Administrative decision 
Hol./PT - Holistic/Primary Trait 



/ 

/ 



TABLE 5 

The Development of Minimum standards in Writing in states 





How V 


as the 


minimum 


stands 


irds document developed? 


How is inform, from writing assessment utilized? 


State 


Clsrm. 
Teacher 


Sen. 
Adm. 


Uni. 
Pers. 


SDE 


1 L-L ClL. L 


Com. 


uuier 


1 trA 


sen. 


Clsrm. 


SDE 


Other 


AL 








X 








A 


X 


X 






CA 














Di ^t* r?p\7 

Dwn 


V 
A 


y 
A 






Sub & Legislature 


HI 
















X 










ID 








X 




X 




X 


X 








LA 








X 








X 


X 


X 


X 




ME 














Oom. Review 


X 








NAEP Studies 


MA 

ilk 
V£> 

MI 












x 




X 


X 






unuer aeveiopuenu 


MN 
























o La u cwxu e jttepur u jl ny 


NH 






















X 


Model by Local Dist. 


NV 


X 


X 


X 






X 


Oom. rep. & 
sp. cons. 


X 


X 


X 




Students & parents 


OH 












X 




X 


X 




X 


Education org. 


OR 
















y 


y 


Y 
A 




leg. & media 


RI 


X 


X 


X 


X 


X 


X 










X 


S.D. of Regents 


SC 
WY 












x 


Dist. level 
rirr. sp. 


X 
X 


X 
X 


X 


X 


Returned to school 


NJ 


1 


1 


1 


1 


1 


x 1 




x 1 


x 1 


x 1 


\ 


Students 



ER?C 6i 



School 
System 



AR, 

Little Rock 
AZ, 

Phoenix 
CA, 

Monterey 
FL, 

Tallahassee 

i/ 

GA, 

Atlanta 

*, 

Des Moines 

< IL, 
Chicago 

OH, 

Cincinnati 
MD, 

Baltimore 
MA, 

Boston 



MX, 

Detroit 



TABLE 6 

The Development of Miminum Standards in Writing in School Systems 



Clsrm 
Teacher 



How was the minimum standards document developed? 



X 



Sen 
Adm 



Uni 
Pers 



X 



SDE 



Contract 



Com. 
Rep. 



Other 



Lang, arts 
sup. & Clsrm T 



How is inform, from writing assessment utilized? 



LEA 



X 



X 



Parents 



com. of t. , 
ad., pub., & 
pers. 

Cen. Staff & 
Sup. 



X 



Sch. 



X 



X 



X 



X 



X 



Clsrm. 



X 



X 



X 



X 



X 



SDE 



Other 



Parents 



9 
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TABLE 6 (Cont.) 

The Development of Minimum Standards in Writing in School Systems 





How was thp mi n imiim ct-^rY^rv^e Avumnnf ^Ar,AUtw5o 




School 
System 


Clsrm. 
Teacher 


Sch. 
Adm. 


Uni. 
Pers. 


SDE 


Contract 


Com. 
Rep. 


Other 


nuw j 
LEA 


.to Amor 
Sch. 


m. rrom v 
Clsrm. 


nritmc 
SDE 


1 assessment utilized? 
Other 


NC, 

Raleigh 
NM, 

Santa Fe 
NY, 

New York 
TO, 

Austin 
WA, 

Seattle 

en 

~WY, 
Laramie 

Explanation of i 

i 

How was the min 
Clsrm. - Classn 
Sch. A3m. - Schc 
Uni* Pers, - Un. 
Com. Rep. - Com 

i 

How is informat; 
Sch. - School | 
Clsrm. - Classrc 


X 
X 
X 

tobreviati< 

I 

imim stand 
Dom | 
x>l Admini 
iversity P< 
nit tee Rep 

Lon from wi 

XJTl 


X 
Dns 

ards do 

i 

strator 
srsonne. 
tresenta 

i 

citing 


cument d 

s 
1 

bive 

assessme; 


X 
X 

evelop 
it uti. 


ed? 

lized? 


X 
X 


Coll. of Re- 
search ideas 

Committee 


X 
X 

X 


X 
X 
X 
X 
X 
X 


X 

X 
X 


X 

x. 

X 


Parents/students 



TABLE 7 

Writing Sample as Measures of Written Composition in States 





Is it 


M.S. 




WHO IS TESTED? 


State 


used? 


<5,000 


5,000- 
10,000 


10,000- 
20,000 


20,000- 
30,000 


30,000- 
40,000 


40,000- 
60,000 


>60,000 


Entire 
Pop. 


AL 


Yes 


Yes 












P,M,J 




Yes 


CA 


ifes' 75 


NO 


S 














No 


HI 


Yes 


Yes 


M,J,S 














No 


ID 


Yes 


Yes 






J,S 










Yes 


LA 


Yes 


Yes 


P,J,S 














No 


ME 


Yes 








S 












MA 


Yes 


Yes 














J 


Yes 


MI 


Yes 


Yes 


P,M,J 










* 




No 


MN 


Yes 


No 


M,J,S 














No 


NH 


Yes 


No 


M,J,S 














NO 


NJ 


Yes 


Yes 














s 


Yes 


NM 


Yes 


Yes 


















NV 


Yes 


Yes 




S 




• 








Yes 


OH 


Yes 


No 


J,S 














No 


OR 


Yes 


No 


M, J ,S 














No 


RI 


Yes 


Yes 


P,M,J,S 














Yes 


SC 


Yes 


Yes 














M,J,S 

i 


Yes 


NY 


Yes 


No 


M,J 




I 


I 


I 


I : 
! I 


Yes 



TABLE 7 (Cont.) 
Writing Sample as Measures of Written Composition 



State 


Is it 
used? 


M.S." 


WHO IS TESTED? 


<5,000 


5,000- 
10,000 


10,000- 
20,000 


20,000- 
30,000 


30,000- 
40,000 


40,000- 
60,000 


>60,000 


Entire 
Pop. 


MD 
FL 
DE 
PA 
TX 
NC 

Explanation of 

M.S. - Minimum 
P. - Primary 
M. - Middle 
J. - Junior Hi 
S. - Senior Hi 


Yes 

Yes 

Yes 

No 

Yes 

Yes 

Abbrev 
Standa 

gn 
gh 


Yes 
Yes 
No 

Yes 
Yes 

iation 
rds 


P,M,J,S 
J 

S 

s 












S 

P,M,S 


Yes 

No 

No 

Yes 
No 



TABLE 8 

Writing Sample as Measures of: Written Composition in School Systems 



School 
System 


Is it 
used? 


M.S. 






WHO IS 


TESTED? 






<5,000 


k nnn 

10,000 


10,000- 
20,000 


20,000- 
30,000 


30,000- 
40,000 


40,000- 
60,000 


>60,000 


Entire" 
Pop. 


AR, 

Little Rock 


















■ 




AZ, 

Phoenix 


Yes 


Yes 




s 












Yes 


CA, 

Monterey 


Yes 


Yes 


P,M,J,S 














Yes 


FL, 

Tallahassee 


Yes 


Yes 




M,J 












Yes 


GA, 

Atlanta 


No 




















IA, 

Des Moines 


Yes 


No 


J 














Yes 


IL, 

Chicago 


Yes 


Yes 














s 




OH, 

Cincinnati 


Yes 


Yes 


P,M,J,S 














Yes 


MD, 

Baltimore 


Yes 


Yes 


M 






S 




J 


p 


Yes 


MA, 

Boston 


Yes 


Yes 




P,M,J,S 












Yes 


MI, 

Detroit 


Yes 


Yes 






s 






— , 




Yes 



TABLE 8 (Cont.) 

Writing Sample as Measures of Written Composition in School Systems 



School 
System 


Is it 
used? 


M.S. 






WHO IS 


TESTED? 






<5,000 


5,000- 
10,000 


10,000- 
20,000 


20,000- 
30,000 


30,000- 
40,000 


40,000- 
60,000 


>60,000 


Entire 
Pop. 


NC, 

Raleigh 


Yes 


Yes 


P,M,J,S 












■ 


No 


NM, 

Santa Fe 


Yes 


No 


J,S 














Yes 


NY, 

New York 


Yes 


NO 














J,S 


Yes 


TX, 

Austin 


Yes 


Yes 


P,M,S 












• 


Yes 


WA, 

Seattle 


Yes 




p m .i <; 














Yes 


WY, 

Laramie 


Yes 


NO 


M,J 
















Explanation of 


Abbrev 


iation 


s 
















P. - Primary 
M. - Middle 
J. - Junior Hi< 
S, - Senior Hie 

i 


3h 
3h 
















s 





o 

ERIC 
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TABLE 8 (Cont.) 

Writing Sample as Measures of Written Composition in School Systems 



School 
System 


Is it 
used? 


M.S. 




WHO IS TESTED? 


<5,000 


5,000- 
10,000 


10,000- 
20,000 


20,000- 
30,000 


30,000- 
40,000 


40,000- 
60,000 


>60,000 


Entire 
Pop. 


WI f 

Madison 
OR, 

Portland 
NM, 

Albuquerque 
KS, 

Wichita 
Explanation of 

i 

M.S. - Minimum 
P. - Primary 
m. - Middle 
J. - Junior Hi< 
S. - Senior Hi< 


Yes 
No 
Yes 
Yes 
Abbrev 

i 

Standa 

3h 
3h 


NO 

Yes 

No 

iation 
rds 


M,J,S 

s 


P,M,J,S 


P,M,J,S 










No 

Yes 
Yes 



TABLE 9 
Key Participants 
Developnent of the Writing Task in' State Assessment Programs 



r*i. _ i. _ 


Who c 


Jevelopec 


the it 


:ans for ti 


le writing sample? 


Type 


Eval. 


Eval. Mechanics 


state 


SDE 


Uni. 
Pers. 


Con. 


Teacher 


LEA 


Other 


N. 


E. 


D. 


p. 


0. 




S. 


c. 


p. 


L. 


u. 


H. 




X 


X 




X 


X 


Committee 


X 










Yes 


X 


X 


X 


X 


X 


X 


1 CA 

1 


X 


X 




X 


X 


Committee 




X 


X 


' 




Yes 


X 


X 


X 


X 


X 


X 


HI 












Spec. Task 


X 


X 


X 1 


X 




Yes 


X 


X 


X 


X 


X 




ID 


x ; 












X 


X 


X 


X 




No 














LA 








X 




Committee 


X 


X 


X 


X 




Yes 


X 


X 


X 


X 


X 


X 


ME 












NAEP 


X 


X 


X 






No 














MA 

i 


X 






X 




Commitee 






X 


X 




No 














j 

MI 


X 


X 








NAEP 


Xud 


Xud 


Xud 






Yes 














MN 


X 


X 




X 


X 


Committee 


X 


X 


X 


X 


X 


Yes 


X 


X 


X 


X 


X 




NH 


X 












X78 


X80 








Yes 


X 


X 


X 


X 


X 




NJ 






X 








X 










No 














m 


X 






X 


X 


Committee 






X 


X 


B.L. 
















NV 


X 






X 




Committee 




X 


X 


X 


B.L. 


Yes 


X 


X 


X 


X 


X 


X 


OH 


X 


X 




X 




Committee 


X 


X 


X 


X 




Yes 


X 


X 


X 


X 


X 


X 


OR 


X 












Y 

d\ 


v 

A 


Y 
A 


A 




NO 














RI 






X 








X 






X 




Yes 


X 




X 


X 






SC 


X 


x 


x 


X 


X 


Rev. Group 


X 


X 


X 


X 




Yes 


X 


X 


X 


X 


X 


X 


WY 


X 


x I 

■ J -L 




X 






x 1 
1 

1 




X 


1 
1 


1 
1 


Yes | 

, 1 


x 1 

1 


x 1 

1 


x 1 

1 


x 1 

1 


x 1 

1 


x 1 

1 



gj^^nder development 7/ 



7S 



TABLE 9 (Cont.) 

Development of the Writing Task in state Assessment Programs 



State 


Who c 


evelopec 


the it 


:ems for tY 


le writing sample? 


Type 


Eval. 


Eval. Mechanics 


SDE 


Uni, 
Pers. 


Con. 


Teacher 


LEA 


Other 


M. 


E. 1 D. 


p. 


0. 




S. 


c. 


p. 


L. 


u. 


H. 


MD 

FL 

DE 
TX 
NC 

Explanation of 
1 

Who developed 
Uni. Pers. - U 
Con. - Contrac 

N. - Narration 
E. - Expositio 
D. - Descripti< 
P. - Persuasioi 
0. - Other (B. 

1 

Evaluation Mec 
S. - Spelling | 
C. - Capitalize 
P. - Punctuatic 
L* - Language ! 
U. - Usage | 
H. - Handwritii 


X 

X 
X 

Abbre 

the it 
nivers 
tor 

n 

on 

n 

L. - B 

l 

nanics 

ation 
Dn 

Struct 


X 

viations 

ems for 
ity Pers 

usiness 
are 


X 

X 
X 

the wri 
onnel 

Letter, 


X 
X 

ting sampl 
Mess. - f 


X 
X 

e? 

essage 


State, Clsrm 
teacher & 
Pri. Con. 

MAEP 
) 


X 

X 
X 


X 


X 
X 


X 
X 


X 

spec 
Task 


Yes 
Yes 

Yes 

No 

Yes 


X 
X 


X 

X 
X 


X 
X 

X 
X 


X 

X 
X 


X 
X 

X 
X 


X 



TABLE 10 

Development of Writing Tasks in School Systems 



School 


Who c 


ievelopec 


1 the it 


.ems for the writing sample? 


Type 


Eval. 


Eval. Mechanics 




Uni. 


































System 


SDE 


Pers. 


Con. 


Teacher 


LEA 


Other 


N. 


E. 


D. 


P. 


0. 




S. 


c. 






n 




r\i\ f 






































Little Rack 






































i\U f 






































Phoenix 








X 


X 




X 


X 


X 






Yes 


X 


X 


x 


x 


x 


X 








































Monterey 








X 








x 


x 






ies 


v 
A 


v 
A 


Y 
A 


Y 
A 


v 
A 


v 
A 


FL, 






































Tallahassee 




X 




X 


X 




X 


X 


X 






Yes 


X 


X 


X 


X 


X 


X 


TA 






































Des ffoines 








X 


X 










X 




Yes 


X 


X 


x 


X 


x 


x 


en 
vo 

Tr 

Chicago 








X 






X 


X 


X 


X 


Pro/ 
Supp 


xes 


X 


X 


X 


X 


X 


x 


OH 

Cincinnati 








X 




Super./ 
Review P. 










X 


Vdc 


X 


X 


X 


X 


X 


X 


Mn 






































Baltimore 








X 


X 






X 




X 




Vpe 

ICO 


X 


X 


X 


X 


X 


x 


MA 






































Boston 


X 






X 






X 


X 


X 






Vac /U 














MT 






































Detroit 






X 






Dept./L.A. 




X 








Yes 


X 


X 


X 


X 


X 


X 


NC, 






































Raleigh 






X 


X 


X 




X 


X 


X 






Yes 


X 


X 


X 


X 


X 


X 


NM, 

Santa Fe 








X 






1 


1 


D 

1 


1 


tess 
3.L. 

1 




1 


1 


1 


1 


1 
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TABLE 10 (Cont.) 
Development of Writing Tasks in School Systems 



School 
System 



New York 
TX, 

Austin 
WA, 

Seattle 
WY, 

Laramie 



Who developed 



SDE 



Uni. 
Pers. 



the items for the writing sample? 



Con. 



Teacher 



X 



LEA 



Other 



Clsrm T./ 
Curr. Sp. 

Com. of 
Local and 
State Uni. 
members 



^planation of Abbreviations 

i i i 

Who developed the items for the writing sample? 
Uni. Pers. - University Personnel 
Con. - Contractor 

Type 

N. - Narration 
E. - Exposition 
D. - Description 
P. - Persuasion 
0. - Other (B.L. - Business Letter," Mess. - Message) 

I ! 

Evaluation Mechanics 
S. - Spelling | 
C. - Capitalization 
P. - Punctuation 
L. - Language Structure 
U. - Usage | 
H. - Handwriting 



Type 



N. 



X 



E. D. P. 0. 



X 



X 



B.L. 



Var. 



Let. 



Eval. 



Yes 



Yes 



X 



Eval. Mechanics 



X 



C. P. 



X 



L. 



X 



X 



u. 



X 



ERIC 
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84 



TABLE 10 (Cont.) 
Development of Writing Tasks in School Systems 



School 
System 


Who c 


evelopec 


the it 


:ems for tY 


le writing sample? 


Type 


Eval. 1 Eval. Mechanics 


SDE 


Uni. 
Pers. 


Con. 


Teacher 


LEA 


Other 


N. 


E. 


D. 


p. 


0. 




S. 


c. 


p. 


L. 


u. 


H. 


* WI, 
Madison 

NM, 

Albuquerque 
KS, 

Wichita 
"Explanation of 

i 

Who developed 
Uni. Pers. - U 
Con. - Contrac 

Type 

N. - Narration 
E. - Expositioi 
D. - Descriptic 
P. - Persuasia 
0. - Other (B.) 

I 

Evaluation Med 
S. - Spelling | 
C. - Capitalize 
P. - Punctuatic 
L. - Language i 
U, - Usage | 
H. - Handwritir 


X 
X 

Abbre 

the it 
nivers 
tor 

n 

Dn 
i 

Li. - B 
1 

ianics 

ation 

Jtructi 


X 

viations 

ems for 
ity Pers 

usiness 
jre 


the wri 
onnel 

Letter, 


X 
X 
X 

ting sampl 
Mess. - M 


e? 

essage 

l 


Parent/Bus . 
People 

Coord, of 
Lang. Arts 

) 


X 


X 
X 
X 


X 


X 
X 


BUS. 
VOC. 


No 

Yes 

Yes 


X 
X 


X 
X 


X 
X 


X 
X 


X 
X 


X 
X 
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TABLE 11 

Objective Testing in State Assessment Programs 



State 


Obj. 
Test 


M.S. 


If an ob] 
designate 


ective test is used, which of the following domains are measured 
id grade levels? 


at the 


Content 


Organization 


Style 


Spelling 


Punctuation 


Capitalization 


Syntax 


AL 


No 


















CA 


Yes 


No 


M 


M 

1*1 


m a 


n ii p 

P,M,S 


P,M,S 


P,M,S 


D M C 


HI 


No 


















ID 


NO 


















LA 


Yes 


Yes 


P,J,S 


P,J,S 


PfJfS 


P,J,S 


P,J,S 


P,J,S 


T) 7 O 


FJ2 




















MA 


No 


















MI 


U.D. 


















MN 


NO 


















NH 


NO 


















NJ 


Yes 


Yes 




s 




S 


s 


s 




NM 


NO 


















NV 


Yes3,6 
No9-12 


No 
No 








P,M 


P,M 


P,M 




OH 


Yes 


Yes 








J,S 


J,S 


J,S 




OR 


Yes 


No 




M 7 




M,J 


M,J 


M,J 




RI 


Yes 


No 








P,M 


P,M 


P,M 




SC 


NO 


















WY 


NO 














1 
1 
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TABLE 11 (Cont.) 
Objective Testing in State Assessment Programs 



State 


lobj. 
Test 


M.S. 


If an objective test is used, which of the following domains are measured at the " 
designated grade levels? 


Content 


Organization 


Style 


Spelling 


Punctuation 


Capitalization 


Syntax 


NC 
TX 
PA 
DE 
FL 
MD 

Explanation of 

M.S. - Minimum 

P. - Primary 
M. - Middle 
J. - Junior Hi< 
S. - Senior Hi( 


No 

Yes 

Yes 

Yes 

Yes 

Yes 

Abbrev 
1 

Standa 

3h 
3h 


Yes 

No 

No 

Yes 

Yes 

iation 
rds 


M,J,S 
S 

s 


M,J,S 

P,M,0,S 
S 


M,J,S 


P,M,S 

P,M,J,S 
P,M,J,S 


P,M,S 
M,J,S 
P,M,J,S 
P,M,J,S 


P,M,S 
M,J,S 
P,M,J,S 
P,M,J,S 


P,M,S 
M,J,S 
P,M,J,S 
P,M,J,S 



o S3 
ERIC 



90 



/ 



I TABLE 12 

Objective Testing in School Systems 



ERIC 9i 



School 


Obj. 


M.S. 


If an objective test is used, which of the following domains are measured 
designated grade levels? 


at the 


System 


Test 


Content 


Organization 


Style 


Spelling 


Punctuation 


Capitalization 


oynuax 


AR, 

Little Rock 


i 

Yes 

i 


NO 








P,M, J,S 


P,M,J,S 


P,M,J,S 




AZ, - 
Phoenix 


Y<j>s 

! 


Yes 


o 
o 


b 




S 


S 


S 


s 


CA f 

Monterey 


i 

Nc{> 






i 












FL f 

Tallahassee 


I 

Yes 


Yes 


J 


M,u 


P,M,J 


P,M, J 


P,M,J 


P,M,J 




GA, 

Atlanta 


Yek 


Yes 




\ 

m t\ o 

\ 

\ 




P,M,J,S 


P,M, J,S 


P,M,J,S 


P,M,J,S 


IA f 

Des M:>ines 


NO \ 
















IL, 

Chicago 


NO 

\ 

i 

Yes 






\ 
\ 












OH 

Cincinnati 


Yes 

\ 

\ 

Yes 




V 

PfM \ 


M, J,S 


P,M,J,S 


P,M,J,S 


P,M,J,S 


P,M,J,S 


MD, 

Baltimore 


Yes 








P,M, J,S 


P,M,J,S 


P,M,J,S 


P,M,J,S 


MA, 

Boston 


No 






\ 












Detroit 


Yes 


Yefe 




\ 




S 


S 


S 


D 


NC, 

Raleigh 


Yes 
CRT | 


\ 


\ 




1 
1 


P,M,J,S | 
1 


P,M,J,S | 
I 


P,M,J,S | 
1 


P,M,J,S 



\ 



TABLE 12 (Cont.) 
Objective Testing in School Systems 



School 
System 


Obj. 
Test 


1 

M.S. 


If an obj 
designate 


ective test is used, which of the following domains are measured at the 
■d grade levels? 


Content 


Organization 


Style 


Spelling 


Punctuation 


Capitalization 


Syntax 


NM, 

Santa Fe 
NY, 

New York 
TX, 

Austin 
WA, 

Seatt^ j 
WY, 

Laramie 

Explanation of 

M.S. - Minimum 

P. - Primary 
M. - Middle 
J. - Junior Hi< 
S. - Senior Hi< 

i 


No 
No 
Yes 
No 

No 

Abbrev 
I 

Stand a 

3h 
3h 


Yes 

iation 
rds 


P,M,S 

s 


P,M,S 




P,M,S 


P,M,S 


P,M,S 


P,M,S 



/ 



TABLE 13 

Reporting of Writing Assessment Results by State Agencies 



How are results reported? 



State 


Subj. 


Domain 


Obj. 


Item 


Student 


Clsrm 


Sch. 


LEA 


or cue 
SDE 


Perc. 
Co. 


Percent 


Stanine 


Sta. 
Score 


Other 




AL 






X 




X 










X 












CA 




X 










X 






X 


X 


X 




Latent Trait Theory 


• 


HI 






X 










X 












Percentage 4-1 




ID 


X 








X 




X 


X 


X 


X 












LA 


X 


X 


X 


X 


X. 


X 


X 


X 


X 


X 












ME 






X 












X 


X 












MA 


X 








X 




X 


X 


X 










Holistic 


MI 
MN 
NH 


x 




Y 


<\uu • 
X 










X 
X 


X 








Other analyses 

Des. of high of 
low papers 




NJ 
NM 


X 




X 


X 


X 
X 


X 


X 


X 


X 


X 

* 






X 


Pass /Fail 




NV 


X 


X 


X 


X 


X 




X 


X 


X 


X 


X 


X 


X 


Holistic 




OH 


X 




X 


X 










X 










% of student & 
categories 




OR 
RI 




X 


X 
X 


X 
X 


X 


X 


X 




X 

X \ 


X 

<8,10 


X4,6 






Writing ex. 1-4 




SC 


X 




X 




X 


X 


X 


X 


X 


X 








Raw Score 


ERjC J 


WY 




X 






X 






X 


X 
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TABLE 13 (Cont.) 
Reporting of Writing Assessment Results by state Agencies 



How are results reported? 



School 
System 



Subj. 



Domain 



3b j, 



Item 



Student 



Clsrm 



Srade 
Sch. 



Srade 
LEA 



Grade 
SDE 



Perc. 
Co. 



Percent 



Stanine 



Sta. 
Score 



Other 



— i 



MD 

FL 

DE 
PA 
TX 
NC 



X 
X 
X 



Explanation of Abbreviations 



Subj. - Subject area 
Obj. - Objective 
Clsrm - Classroom 
Perc. Co. - Percentage correct 
Sta. score - Standard score 



X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



X 
X 



Holistic 

% mastery of 
standards 



% mastery 
undecided 
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TABLE 14 

Reporting of Writing Assessment Results in School Systems 



How are results reported? 



School 
bystero 


dud] • 


Domain 


JD J • 


item 


btuaent 


Clsrm 


Srade 
sen. 


Grade 
LEA 


Grade 
SDE 


Perc. 
Co, 


Percent 


Stanine 


Sta. 
Score 


Other 


AR, 

LrluuXe KOCK 


v 

A 


v 
A 






v 
A 




v 
A 


X 




X 


X 


X 




Growth Scale 


AZ, 

Fnoenix 






v 
X 




v 
A 


A 


A 


X 




X 








No of skills passed 


CA, 

pioneer ey 




v 
A 






Y 

A 




v 
A 


v 
X 












Student rating pro 
by oomp. domain 


FL, 

/ 1 1 1 i ^Vn r>AA 

1 cLLlculaSSee 


V 

A 


v 
A 


V 
A 




Y 
A 


v 
A 


v 
X 


v 
X 












% ach. of objective 


GA, 


Y 
A 




Y 
A 




Y 
A 


Y 
A 


Y 
A 


Y 

X 




Y 

X 










IA, 

ues Moines 






v 
A 


v 
A 


v 
A 


X 








X 








Item 


IL, 

vjnicago 


v 
A 








v 
A 










X 










OH, 

Cincinnati 


v 
A 








v 
A 


v 
A 


X 






> 

X 










MD, 

Baltimore 






X 


X 


X 


X 


X 






X 










MA, 

Boston 


X 








X 


















Holistic 


MI, 

Detroit 


X 




X 




X 




X 


x 




X 
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TABLE 14 (Oont.) 
Reporting of Writing Assessment Results in School Systems 



How are results reported? 



School 
System 


Subj. 


Domain 


Obj. 


Item 


Student 


"lsrm 


Grade 
Sch. 


Grade 
LEA 


Grade 
SDE 


Perc. 
Co. 


Percent 


Stanine 


Sta. 
Score 


Other 


NC, 

Raleigh 






X 




X 


















Objective mastered 


NM, 

Santa Fe 


X 














X 












Pass/Fail 


NY, 

New York 










X 




X 






X 










Austin 


X 




X 


X 


X 






X 


X 


X 






X 




WA, 

Seattle 








X 


X 


















Satisfactory/Unsa 


WY, 

Laramie 


X 














X 












Holistic 
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TABLE 14 (Cont.) 
Objective Testing in School Systems 



o 



School 
System 


Obj. 
Test 


M.S. 


NM, 

Santa Fe 


No 




NY, 

New York 


No 




TX, 

Austin 


Yes 


Yes 


WA, 

Seattle 


No 




WY, 

Laramie 

i 


No 





If an objective test is used, which of the 
designated grade levels? 
Content 



following domains are measured at the 



P,M,S 



Organization 



Style Spelling 



P,M,S 



Explanation of Abbreviations 

I I 

M.S. - Minimum Standards 



P. - Primary 
M. - Middle 
J. - Junior High 
S. - Senior High 



P,M,S 



Punctuation 



P,M,S 



Capitalization I Syntax 



P,M,S 



P,M,S 
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TABLE 14 (Cont.) 
Objective Testing in State Assessment Programs 



School 
System 


Obj. 
Test 


M.S. 


It an objective test is used, which of the following domains are measured at the 
designated grade levels? 


Content 


Organization 


Style 


Spelling 


Punctuation 


Capitalization 


Syntax 


KS, 

Wichita 
NM, 

Albuquerque 
OR, 

Portland 
WI, 

Madison 
Explanation of 

i 

M.S. - Minimum 

P. - Primary 
M. - Middle 
J. - Junior Hi 
S. - Senior Hi< 


Yes 

res 4,6 
to 10 

Yes 

Yes 

Abbrev 
Standa 

gh 
gh 


No 

,9 

Yes 
No 

iation 
rds 


P,M,J,S 
P,J,S 

s 


P,M,J,S 
P,M,J,S 


P,M,J,S 
P,M,J,S 


P,M,J,S 
P,M,J,S 

P,M,J,S 
P,J,S 


P,M,J,S 
P,M,J,S 

P,M,J,S 
P,J,S 

♦ 


P,M,J,S 
P,M,J,S 

P,M,J,S 
P,J,S 


P,M,J,S 
P,M,J,S 

P,M,J,S 
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Summary of National Conference on Writing Assessment 



In order to conduct a deeper probe into the nature of large-scale 
assessment in the Nation, the investigators planned a conference to which 
individuals who were involved in large assessment programs were invited 
The responses were classified according to the assessment programs which 
are described below: 

1 . States and local Agencies that use a Writing Sample 

a. Holistic Scoring 

b. Primary Trait 

c. Analytical 

2. States and local Agencies that do not use a Writing Sample 

After the responses were classified a random sample was drawn which 
was representative of the description of writing assessment programs as 
indicated from the survey. A representative from each 'program identified 
in the random selection was invited to attend a Writing Assessment 
Conference in New Orleans on July 10. Each representative accepted the 
invitation. Listed below are the systems which were represented at the 
conference. After the name of the system is a description of the type of 
assessment program used in that system. 



In addition to the identified participants other persons attended 
representing the states of Arkansas, Louisiana, Texas and South Carolina. 
A complete listing is provided in the Appendix. 

The conference was planned using a discussion format. No planned 
speeches were presented. The investigators led the discussion following 
the questions indicated on the agenda provided on the next pages. 
Following the agenda is a summary of the ensuing discussion. 



Delaware 
Pennsylvania 
North Carolina 



Primary Trait 

No Writing Sample 

Holistic/Analytical 

Analytical 

Focused Holistic 

In process of developing the 

program 

Holistic 

Holistic 

No Writing Sample 
Holistic 



Florida 
Texas 



Maryland 



Madison, Wisconsin 
Wichita, Kansas 
Portland, Oregon 
Albuquerque, Nex Mexico 
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AGENDA 

♦ 

NIE - IDUISIANA TECH WRITING CONFERENCE 

FEASIBILITY OF ASSESSING WRITING 
USING MULTIPLE TECHNIQUES 

Hotel Marie Antoinette 
New Orleans, Louisiana 
July 10, 1981 

- Conference Overview 

This conference is planned to facilitate discussion. There 
no planned speeches. The following questions have been 
presented for consideration. 



1 . What are the purposes of a large-scale writing assessment? 

2. What information is obtained by a direct measure of writing 
that is not available from indirect measures? 

3. Hc^'are writing samples scored efficiently and reliably? 
a r What are the advantages of the various methods of 

/ assessing written composition (holistic, primary trait, 
and analytical)? 

1 • What information is yielded by each scoring 
technique? 

2. How is the information reported and utilized? 

3. Can this information be utilized to improve* 
instruction? How? 

4. How does this information compare with information 
yielded by indirect measures? 

b. How is mechanics scored? 

c. How are writing tasks developed? 

1. What kinds of writing tasks are developed? 

2. How many tasks are needed to secure a reliable and 
valid sample of writing? 

3. How are tasks developed? 

4. What types of prompts elicit the best responses? 

5. How much time is allotted for students to write? 

4. The Scoring Process 

a. How are scorers selected and trained? 

b. Describe the scoring process. 

c How is scorer reliability maintained? 

d. What is time involved in scoring using each of the 
techniques? 

e. What is the cost involved in scoring using each of the 
techniques? 
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5. How is the scoring of large numbers of writing samples 
managed? 

a. What management plans have been used successfully to 
score the entire population? 

b. What sampling techniques yield the best results in 
describing the status of writing in a given 
population? 

c. How does an agency maintain reference point from year 
to year when scoring writing samples? 

Conclusions 

Considering time, cost, and information yielded, is the 
scoring of a writing sample in large-scale assessment cost 
effective? 

How does an agency make the cost/benefit decision on 
whether to test writing by direct or indirect measures? 

How can both approaches be integrated? 
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Discussion Summary 

This section presents a summary of discussion stimulated by each 
question on the agenda. 

1. What are the purposes of a large-scale writing assessment? 
a. Who are audiences? 



Practitioners agreed that the major purposes of large-scale writing 
assessment are four- fold as follows: 

1 . improvement of instruction; 

2. assurance that each child has acquired the basic skills in 
writing; 

3. determination of the status of writing in a given state or 
system; 

4. provision of quality education. 

2. What information is obtained by a direct measure of writing that 
is not available from indirect measures? 

This question generated a great deal of discussion which included 
a nunber of ODinions. There appeared to be a concensus that certain 
qualities of waiting ability can be measured through the use of objective 
tests. Research studies were cited which demonstrate a high correlation 
between direct and indirect measures. Lutz indicated that a multiple- 
choice item can be constructed to measure a given writing skill; however, a 
problem inherent in the construction of the item is the establishment of 
the parameters of the item. Apparently, this is an area of needed 
research. 

Systems utilizing direct measurement indicated that while their 
assessment programs were mandated, the decision to use direct measures of 
writing resulted from recommendations by professionals. There seemed to be 
a concensus among participants that certain qualities can be measured only 
by use of a writing sample. Purves asserted that the general public does 
not tend to accept the results of objective tests as an indication of 
writing ability. One participant indicated that a writing sample offers 
more "legal evidence" of writing ability than do objective tests. On the 
other hand, a participant from a large city school system argued that for 
an indirect measure to be "worth anything" the results must get back to the 
school level. The cost of scoring a writing sample for every student is 
prohibitive. Therefore, the school system elected not to make use of a 
writing sample rather than resort to the use of a random sample. 

The participants agreed that the ideal situation would be one in which 
every paper is scored. However, time and cost prohibit the scoring of a 
writing sample for an entire population of students. Most participants 
tended to agree that although the information yielded by a writing sanple 
could only be generalized to a population, the scoring of writing from a 
sample of the population is essential to the writing assessment program 
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primarily because of the instructional message to teachers, if teachers 
know that students' writing ability will be evaluated by means of direct 
measure, teachers will require more writing production from students. 

3. How^ are writing samples scored efficiently and reliably? 

a. What are the advantages of the various methods of assessing 
wrztten composition (holistic, primary trait, and analytical)? 

1. What znformation is yielded by each scoring technique? 

2. How is the^ information reported and utilized? 

3. Can this information be utilized to improve instruction':' 
How? 

4. How does this information compare with information yielded 
by indirect measures? 

b. How is mechanics scored? 

c. How are writing tasks developed? 

1. What kinds of writing tasks a -3 developed? 

2. How many tasks are needed to secure a reliable and valid 
sample of writing? 

3. How are tasks developed? 

4. What types of prompts elicit the best responses? 

5. How much time is allotted for students to write? 

4. The Scoring Process 

a. How are scorers selected and trained? 

b. Describe the scoring process. 

c. How is scorer reliability maintained? 

d. What is time involved in scoring using each of the techniques? 

e. What is the cost involved in scoring using each of the 
techniques? 

To introduce the discussion of this question, each of the three 
consultants reviewed the theory anc 1 rationale underlying each of the three 
ma:jor scoring techniques as well as the procedures and guidelines to be 
followed in implementing each system. Following the discussion of the 
three consultants, various participants discussed the scoring procedures as 
implemented in his-her system. A summary of that discussion follows? 
^ A. One state representee reported a scoring technique identified as 
focused-holistic." The discourse model used included the following modes 
or writing: informational, persuasive, and expressive. Focused-holistic 
scoring was defined as being a "criterion-referenced form of holistic 
scoring. A five point scoring guide ranging from 0 (not scorable) to 4 
(excellent) was defined for each writing mode and was based on statewide 
purposes, a specified audience, and the constraint of the text. The 
participant displayed a 100 page scoring document which contained scoring 
guides, papers illustrating each score point, and directions for use. The 
scoring was accomplished by a contractor who employed scorers who met 
qualifications specified by the SDE. To scove the state assessment it 
takes 100-155 readers seven to eight weeks working full time 
Approximately 40 papers are read per hour. Total cost was estimated to be 
12.71 per student. However, the scoring of the writing sample was a part 
of the larger state assessment contract. 
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B. A second state is involved in the developmental stage of a 
procedure identified as analytical holistic. Students were given a choice 
of two options. The writing task was defined in the prompt. Papers were 
scored holistically following procedures outlined by BIS. Eollowing the 
holistic scoring of all items on a paper, analytic scoring was conducted on 
an item-by-item basis. Each item was scored analytically by one scorer 
Scorers were teachers employed by the contractor. Scorers were able to 
score one paper per minute holistically and one paper every three minutes 
analytically, cost was estimated to be approximately §10 per pupil. 

i 

t?'-iJ^ ? ird state Participant described scoring procedures which 
paralleled the primary trait scoring procedures implemented by NAEP. items 
developed by NAEP were used. General comparisons between the national 
performance and the state performance could then be made. 

In addition to primary trait scoring, an expressive item was scored 
for secondary traits and the persuasive writing essay for writing 
mechanics. All scoring was accomplished by a contractor. No cost estimate 
was provided by the participant. 

D. A fourth state ^rticipant described an analytical scoring 
procedure. A unique feature of this state's plan was the fact that 
selected teachers from all over the state were trained as scorers. One 
scorer could score approximately 50 papers. The cost of the program was 
estimated to be about §10 per student. 

E. local school systems in attendance also used a variety of methods 
One LEA indicated that a form of "holistic" .-scoring was used. All teachers 
were dismissed one-half day to score papers. A second school system 
represented was in the developmental stage. The participant's comment was 
simply to indicate that after having the other reports that he had to go 
home and plan some more. A third LEA used a contractor. 

5. How are writing tasks developed? 

a. How many tasks are needed to secure a reliable and valid 
sample of writing? 

b. How are tasks developed? 

o. What types of prompts elicit the best responses? 

The number and kind of writing tasks appeared to vary with the scoring 
technique. Those systems utilizing primary trait scoring tended to develop 
writing tasks which were narrow, clearly defined, and audience-oriented. 
Sm:e primary trait tasks are dependent on the writing mode and since such 
task tend to stimulate a short writing sample, usually more than one task 
was assigned. 

On the other hand, tasks which <were developed for holistic scoring 
tended to be of more open-ended nature. Such tasks were generally designed 
to stimulate a twenty-minute essay. Therefore, usually only one task was 
presented . 

Most systems described similar procedures for developing the writing 
task which paralleled procedures indicated by the results of the 




questionnaire, Once the method of evaluation was determined, tasks were 
designed by committees. In most cases these tasks were subjected to field- 
testing before being utilized in a state assessment program. However, the 
methods employed in analyzing the results from the field test differed from 
system to system. Many looked at the percentage of agreement obtained by 
scorers as well as the nature of the responses from students. 

A variety of topics had been used by the participants. There appeared 
to be a general agreement that topics for the younger child must require 
very little information to be generated, whereas topics for the older 
student required that more information be generated. Participants did not 
agree if an audience should be specified or not. As to be expected, those 
favoring primary trait scoring also favored the specification of an 
audience. Opinions varied among those favoring holistic scoring. Some 
felt that the specification of the audience was nothing more than a 
"fabrication." The student was aware that the "real" audience was the 
teacher or the scorer as the case may be. One state defined the audience 
for the writer simply as that person who wouW score the, writing. 

Dr. Purves questioned if topics or scoring dealt with cultural 
differences. He was especially concerned that in the field test, if a 
topic was found to show undue bias, it was discarded. The culture of the 
population should be considered in designing the task. 

6. How is the scoring of large numbers of writing samples managed? 
■a. What management plans have been used successfully to score the 

entire population? 
b. What sampling techniques yield the best results in describing 
the status of writing in a given population? 

A number of procedures have been used to manage the scoring of large 
numbers of papers. Several systems depend upon a contractor. One state 
brings together teachers selected from throughout the state to score the 
writing samples. Only a description of the status of students' writing 
could be generalized to the total population and a classroom report is 
impossible . 

All agreed that to effectively impact instruction, the information 
must be returned for each school, each classroom, and each student. Dr. 
Purves suggested that a random sample of writing responses be scored at the 
state level as an anchor with the balance of the writing samples being 
returned to the school system from which they originated. At the system 
level a sample might be scored and the rest returned to individual schools 
for classroom teachers to score. With this plan all teachers would be 
involved in the scoring process thereby creating a situation which might 
impact instruction. The sample serving as an anchor at the state level 
would enable comparisons to be made between system samples and state 
samples, school and system samples, and school and state samples. 
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CHAPTER 4 
CONCLUSIONS AND RECOMMENDATIONS 



A central question underlying this research has been related to the 
necessity of a writing sample in a large-scale writing assessment. 
Apparently, numerous authorities in the field of English contend that a 
writing sample is necessary, it is the opinion of these proponents of a 
writing sample that objective tests do not measure the higher order 
thinking skills which are reflected in the student's ability to select and 
explore content, achieve a style and tone appropriate for a designated 
audience and organize content logically. Concerning mechanics, the 
proponents argue that objective tests only measure the student's ability to 
proofread and do not measure his ability to use mechanics correctly. The 
findings in this research indicated that there was no relationship between 
the scores of Louisiana students on the Louisiana State Assessment Test in 
Writing (an objective test) and the scores received on the writing sample 
which was scored by a variation of the Primary Trait System of scoring. 
However, a comparison of the skills measured on the objective test and the 
skills rated on the writing sample revealed that the two measures were not 
evaluating the same skills. Therefore, no conclusion can be made from 
diese findings. 

The scoring of mechanics appears to pose a problem for individuals 
implementing a large-scale writing assessment. For holistic scoring, a 
general impression of the student's ability to use the mechanics correctly 
is a consideration in assigning a score. A similar technique is applied in 
analytical scoring except that a separate score for mechanics is assigned. 
NAEP scores mechanics by determining the percentage of errors. The 
administrators in Louisiana, however, attempted to develop a criterion 
referenced system for scoring mechanics. The minimum competencies were 
defined and the students' use of mechanics was scored in terms of these 
minimum competencies. A separate score was assigned for each of the 
following: punctuation, capitalization, spelling, and syntax. If a 
student violated one convention specified as a minimum skill the student 
was scored "below minimum." The problem with this system was that a 
writing task does not necessarily stimulate all students to attempt the 
same mechanical conventions. Therefore, a "below minimum" score does not 
necessarily mean the same thing for all students. One student may write 
only a few sentences which require no internal punctuation. This student 
is assigned a minimum score. On the other hand, another student may write 
a lengthy, well-developed passage, attempt a number of internal punctuation 
marks, and missuse one of them. This student is assigned a "below minimum" 
score. is the student really "below minimum" in his ability to use 
punctuation marks? How does he compare with the student who attempted no 
punctuation marks? o 

The question renins: can mechanics be scored on a writing sample in 
a large-scale assessment program by a system other than one dependent on 
scorer impressions? 
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A review of the literature did not appear to indicate that sufficient 
research has been conducted to either support the need for a writing sample 
or to indicate the power of the objective test for predicting writing 
ability. The classic study conducted by ETS did indicate that a forty-five 
minute objective test provided as much information as three forty-five 
minute essays (Godshalk, Swineford and Coffman, 1966). Dr. William Lutz, 
Professor of English at Rutgers and consultant for this study contended 
that it, was his opinion that items could be developed that might predict a 
students ability to write. At this time, more research is needed to 
determine the power of a score on an objective measure of writing skills as 
a predictor of writing ability. Until this question is satisfactorily 
addressed, the question concerning the necessity of the writing sample can 
not be answered. 

The consensus among the participants at the conference seemed to be 
that the major reason for including a writing sample with any assessment of 
ability in written composition was political. The fact that a writing 
sample is required would, most agreed, cause more writing to be included in 
the curriculum. However, doubt was expressed concerning the amount and 
value of information yielded by a writing sample given present reporting 
practices. For example, if primary trait scoring is used, the report of 
scores indicates simply that a certain percent scored minimum on a 
particular item at a particular grade level, a certain percent scored above 
minimum and a certain percent below minimum. Unless the classroom teacher 
is thoroughly familiar with the nature and directions of the item and with 
details of the scoring guide, very little is yielded which can be 
translated into instructional improvements. This situation creates a 
certain irony in that the primary trait scoring system was conceived on the 
premise that specificity of writing objective and corresponding writing 
stimulus and rubric would produce information easily interpreted by the 
practitioner. Indeed if doubt exists that primary trait is yielding 
information which can be translated into improved instructional procedures, 
support of a pure holistic method of scoring becomes very dubious. By its 
very nature, holi tic scoring does not easily yeild specific information 
for an item. Ai \ytical scoring presents certain limitations in that the 
scale is not as appropriate to some writing tasks as it is to others. 

Validity of information yielded is affected also by scorer reliability 
which is very difficult to maintain at a level over .70. Reliability 
figures are affected by, among other factors, the nature and complexity of 
the item, and the amount of writing produced. With reliability hovering on 
an average at the .70 to .80 range, the question must be raised of the 
validity of information yielded. 

At best only a limited random sample can be scored in a statewide 
assessment. Therefore impact on specific LEA's and their individual 
schools is obviously limited to non-existent. However, suggestions were 
made of ways to filter assessment results down to the individual school 
level. For example, since every child at designated grade levels is 
tested, pull a random sample to satisfy the legislative mandate of a 
writing assessment. Then return all papers to individual school systems to 
be scored either by designated teachers or by all teachers at certain grade 
levels. Scorer reliability would not be relevant and teachers could 
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observe, first-hand, the performance of their students. As a result 
teachers would be in a better position to address student problems related 
to a specific mode of writing. Then, perhaps, instruction would b* 
impacted . 

The question of the feasibility of including a writing sample in a 
state-wide or system-wide assessment depends upon first, how the writing 
sample is scored and results reported and second, how the results are 
translated into improved instructional procedures, if a sample is scored 
and reported in order to satisfy a legislative mandate but does not result 
in targeted in-service for teachers with follow-up to check for 
implementation, then scoring of a writing sample would be difficult to 
support. However, if resulting data is sufficiently wrung for information 
and if this information is translated into improved instructional 
practices, scoring a writing sample may indeed be worth the cost involved 
in both time and money. it should be noted that no research has been 
conducted to determine the impact of assessment programs on instructional 
procedures practiced in the classroom. 

Throughout the nation, local and state education agencies are 
concerned with the problem of determining what students can and cannot do 
in the basic skill areas. Writing is one of those concerns, while many 
agencies are attempting to measure the ability of students to write, the 
findings of this study indicate that there is little uniformity in how 
writing ability is measured. Most of the education administrators 
responding to the survey in this study indicated that they believed a 
writing sample was essential. However, the methods used to score the 
writing appeared vary^from agency to agency. Although, most of the 
administrators reported using holistic scoring, many interviews with 
several of the individuals indicated that holistic scoring techniques 
varied. After hearing Dr. Lutz describe the holistic technique as applied 
by BIS, one administrator remarked that he just thought he was usinq 
holistic scoring. Apparently, what he used and called holistic scoring was 
nothing like the procedure developed by ETS. The same was true with the 
Primary Trait System, while several agencies reported using the Primary 
Trait System, when the systems were described, each was different from the 
other and from the original system as developed by NAEP. Other agencies 
tended to report such systems as "focused holistic" and 
"analytical -holistic." 

As administrators described the system which was unique to their 
respective agencies, they did so with a great amount of pride and a sense 
of ownership. The procedures described by other participants apparently 
offered no appeal, when violations to basic assumptions were suggested by 
the consultants, administrators tended to ignore these warnings. 

The methods and techniques used to score .writing samples in large 
scale assessment are simply too varied to draw any but the most general 
conclusions about their value. A number of questions, both technical and 
metaphysical, remain unanswered. The nature of the writing tasks which are 
designed to promote writing presents a major area of needed research. Very 
little is known at this point about how carefully norms are determined, how 
stable the results are, how the results might advantage one group over 
another, nor how accurate their predictive value is. Because competency 
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testing programs are being used to make important decisions about people's 
lives, it is imperative that administrators of testing programs, as well as 
the users, scrutinize their programs with care. 
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Project Abstract 



The purpose of this study was to determine the "state of the art" of 
large-scale -writing assessment in the nation. With the onset of the 
accountability movement, many state and local education agencies have been 
mandated to assess all of the basic skills, with written composition presenting 
. the most formidable problems. These problems have resulted in the establish- 
ment of many varied approaches to the assessment process. In order to 
ascertain the status of large-scale assessment of writing in the nation, all 
fifty state education agencies and selected large city agencies were sent a 
questionnaire related to on-going practices being used in assessment/ programs. 
From those responding to the questionnaire, ten participants representing 
varying assessment philosophies were selected to attend a conference on writing 
assessment in New Orleans. The purpose of this conference was to provide both 
verification and clarification of questionnaire findings. As a result of both 
the questionnaire and the conference, the researchers drew conclusions con- 
cerning such problems as the selection of scoring procedures, development of 
items, selection and training of scorers, value and utilization of information 
yielded by a writing sample, scoring of mechanics, and other problems associated 
with the implementation of a large-scale writing assessment utilizing both 
objective and applied procedures. 
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List of City School Systems Receiving a Questionnaire 

Louisiana Scoring Guides for Writing Samples (Mechanics) 

Key Participants and Consultants in National Writing Conference 
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List of City School Systems Receiving a Questionnaire 

Atlanta, GA 

Mr. Alon20 Crim, Superintendent 
Ind. School District 203 
224 Central Avenue, s.W. 
Atlanta, GA 30303 

Miami, FL 

Mr. E. L. Whigham, Superintendent 
1410 N.E. Second Ave. 
Miami, FL 33132 

Chicago, IL 

Mr. James F. Redmond, Superintendent 
228 North La Salle Street 
Chicago, IL 60601 

» 

Houston, TX 

Mr. George G. Garver 

Superintendent of Houston ISD 

3800 Richmond 

Houston, TX 77027 

Dallas, TX 

Mr. Nolan Estes 

Superintendent of Dallas ISD 

37D0 Ross Ave. 

Dallas, TX 75204 

Detroit, MI 

Mr. Charles J. Wblfe, Superintendent 
5057 Woodward 
Detroit, MI 48202 

Cincinnati, OH 

Mr. Donald R. Waldrip, Superintendent 
230 E. 9th St. 
Cincinnati, OH 45202 

Des Moines, IA 

Mr. Dwight M. Davis, Superintendent 

1800 Grand Ave. 

Des foine, IA 50307 

Phoenix, AZ 

Mr. Gerald DeGrow, Superintendent 
2526 W. Osborn Rd. 
Phoenix, AZ 85017 
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Jackson, MS 

Dr. Brandon Sparkman, Superintendent 
Box 2338 

Jackson, MS 39205 
Birmingham, AL 

Mr. Henry C. Sparks, Superintendent 
Board of Education Drawer 10007 
Birmingham, AL 35202 

Philadelphia, PA 

Mr. Matthew W. Gostanzo, Superintendent 
Parkway at 21st St. 
Philadelphia, PA 19103 

Boston, MA 

Mr. William J. Leary, Superintendent 
13 Beacon St. 
Boston, MA 02108 

Louisville, KY 

Dr. Newman Walker, Superintendent 
Fourth at Broadway 
Louisville, KY 40202 

Tulsa, OK 

Mr. TXm Summers, Superintendent 
Tulsa, OK 74101 

Raleigh, NC 
Mr. C. L. Hooper 
City Superintendent 
601 Devereux St. 
Raleigh, NC 27605 

Augusta, GA 

Mr. H. M. Duncan, Superintendent 
Richmond Ctounty Schools 
2083 Heckle St. 
Augusta, GA 30904 

Omaha, NE 

Mr. taen A. Knutzen, Superintendent 
3902 Davenport 
Omaha, NE 68131 

Santa Fe, 114 

Mr. Phillip Bebo, Superintendent 
610 Alta Vista 
Santa Fe, NM 87501 
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Seattle, WA 

Mr. Forbes Bottomly, Superintendent 
Seattle School District No. 1 
815 Fourth Ave. N. 
Seattle, WA 98109 

Kansas City, KS 

Mr. 0. L. Plucker, Superintendent 
Kansas City Wyandotte Unfd. Dist. 500 
Library Bldg. 
Kansas City, KS 66101 

Wichita Falls, KS 

Dr. Alvin E. Morris, Superintendent 
Wichita Sedgwick Unfd. Dist. 259 
428 S. Broadway 
Wichita Fall,"KS 67202 

Madison, WI 

Mr. D. S. Ritchie, Superintendent 
545 W. Dayton 
Madison, WI 53703 

Baltimore, MD 

Mr. RDland N. Patterson, Superintendent 
3 E. 25th St. 
Baltimore, MD 21218 

Charleston, Wfcst Virginia 
Dr. K. E. Underwood 
Charleston, Wfcst Virginia 25311 

» 

Columbia, SC 

Dr. Calud E. Kitchens, Superintendent 
1616 Richland St. 
Columiba, SC 29201 

Richmond, VA 

Dr. Thomas C. Little, Superintendent 
312 N. 9th St. 
Richmond, VA 23219 

Newark, NJ 

Mr. Edward Pfeffer, Superintendent 
Education Bd. 31 Green St. 
Newark, NJ 07102 

Hartford, CP 

Medill Bair, Superintendent 
249 High St. 
Hartford, CT 06103 
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Msbile, AL 

Mr, Harold Collins 

Mobile, AL 36601 



Tallahasse, FL 

Mr. F. W. Ashmore, Superintendent 
P. 0. Box 246 
Tallahassee, FL 32302 

Austin, IX 

Dr. Jack L. Davidson 

Superintendent of Austin ISD 

6100 N. Guadalupe 

Austin, TX 78752 

Albuquerque, NM 

Mr. E. Staple ton, Superintendent 
Box 1927 

Albuquerque, NM 87103 

Cheyenne, WY 
Dr. Joe Lutjeharms 
Superintendent, Dist. 1 
Cheyenne, WY 82001 

New York, NY 
Mr. Calvin E. Gross 
Superintendent of School 
110 Livingston St. 
Brooklyn, NY 11201 

Denver, 00 

Dr. Allan M. Hosier 

Supervisor 

Development & Evaluation 
Denver Public Schools 
3800 York Street 
Denver, CO 80205 

Montclair, NJ 
Mrs. Jtdi Granick 

Director, Planning, Research & Evaluation 
Montclair Board of Education 
22 Valley Itoad 
Montclair, NJ 07042 

Denver, 00 
Mr. I ;ry Beal 

Supervisor, Department of Development & Evaluation 
Denver Public Schools 
900 Grant Street 
Denver, 00 80203 
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Nashville, TO 

Dr. Edward Binkley 

Director 

Department of Research & E> aluation 
Metropolitan Public Schools 
2601 Bransford Avenue 
Nashville, TO 37204 

New Orleans, IA 

Dr. Constance C. Dolese 

Director of Secondary Education 

New Orleans Public Schools 

4100 Iburo Street 

New Orleans, IA 70122 

St, Louis, MO 
Ms. Sandra Edelman 
Administrative Assistant 
St. Louis Public Schools 
1517 South Theresa 
St. Louis, Mo 63104 

Downey, CA 

Dr. Gordon E. Footman 

Director, Division of Program Evaluation, Research, and 

Pupil Services 
Los Angeles County Superintendent of Schools 
9300 East Imperial Highway 
Downey, CA 90242 

Portland, OR 
Dr. Walter Hathaway 
Portland Public Schools 
P. O. Box 3107 
Portland, OR 97208 

Madison, WI 
Dr. Darwin Kaufman 
Evaluation Coordinator 
Department of Public Instruction 
126 Lang don Street, Rn 308 
Madison, WI 53702 

Spokane, WA 

Ms. Sandra Meacham 

Evaluation and Measurement Specialist 
Central Valley School District #356 
123 South Bovrclish Road 
Spokane, WA 99206 
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Monterey, CA 

Dr. Lloyd Swanson 

Director of Evaluation 

Monterey Peninsula Unified 

School District 

P. 0. Box 1031 

Monterey, CA 93940 

Lancaster, PA 

Dr. John Tardibuono 

Director, Project 81 

School District of Lancaster 

225 West Orange Street 

Lancaster, PA 17604 

Pontiac, MI 

Dr. William Veitch 

Assistant Director of Systematic Studies 

Oakland Schools 

2100 Pontiac Lake Road 

Pontiac, MI 48054 

Little Rock, AR 
Dr. Carolyn Weddle 
Assistant-Superintendent of 

Program Implementation 
Little Rock School District 
West Markham and Izard 
Little Rock*, AR 72201 
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Fourth Grade 
Scoring Guide 

Secondary Trait - Spelling 

Level 1 » If a paper does not qualify for a two, a score of one is given- 

2 ■ Spelling is generally reasonable and indicates some concept 

of sound-letter relationship. The following words are spelled 
correctly: 

High frequency words (53) - 



a 


are 


boy 


girl 


it 


the 


an 


at 


can 


he 


of 


they 


and 


be 


for 


In 


on 


to 


was 


you 


I 


is 


that 





plurals of nouns by adding s (36) 

basic color words (31) 

number names through 10 (32) 

The writer adheres to the following conventions: 

spells initial and final consonant sounds (B. IS & 16) 

spells short vowel sound in a word (B. 17) 

spells long vowel sound in a word (B. 18) 

spells phonetically regular words with CVC pattern (B. 19) 

spells one syllable words vith VC final e pattern (B. 20) 

spells words with* variant sounds of /c/ and /%/ (B. 21) 

spells the initial sound in words using consonant blends (B.22) 

spells number names through one hundred, days of week and 
months of year (B. 33, 34, 35) 

spells plurals of nouns by adding es 

spells words with final changed to _i before adding es . 

spells verbs with lng 

adds lng suffix: 

directly to pot word (B i9) 
doubles final consonant (p. +0) 
drops final £ (B 41) 

spells high frtqutncy words 



am 


this 


her 


said,/ 


all 


with 


his 


what 


but 


get 


like 


who 


by 


how 


little 


she 


these 


had 


we 


some 


will 


have 


why 


their 


do 


up 


not 


then 


down 


write 


one 


when 


each 


from 


out 


your 



(B. 54) 

spells holidays, seasons of the year, and frequently used 
school and community words 
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Fourth Grade 
Scoring Guide 
Secondary Trait - Capitalization 
Level 1 - If the paper does not qualify for a two, a acore of one is given. 

2 - Errora in capitalization are preaent. Shove aoae concept of 

capitalization. Capitala are always present in the following 
inatancaa: 

Proper nouna (1) 

Beginning of aentencaa (2a) 

Pronoun "I" (26) 

Abbraviationa (Mr., stonthe, St., Rd., Ave., daya of weak, 
poat office ) (2c) 

Initial* (2d) 

3 - The paper adherea to the following conventional 

Titles (books, poems, reports stories) (2e) 
Titles of peraons (Mother, Fsther, Aunt, Uncle) (2f) 
Titles of addreaa as s psrt of propel nouns (2g) 
Heading, salutation, and closing of letters (A) 



Fourth Grsde 
Scoring Guide 

Secondsry Trait - Punctustion 
Level I • If s psper doss not qualify for s two, s score of one ia given. 

2 - Errors srs pressnt, but ths rssponses adhere to the following 

conventional 

Appropriate and punctuation (la, 2, 3) 

Uaea cobs* corrsctly between dsys of month and yesr, sfter 
greeting and cloeing of a letter, between nenee of cities and 
etetee (4dl) 

Ueee colon appropriately in tine of dey (Se) 

3 * The peper edheree to the following conventions: 

Usss comma in words in s ssriss (4,a.l) 
Underline* title* of books (7s) 

Usss spostrophe with poeeeeeive einguler noun* (lOe) 

Indent* and paregrephet 

Heeding and cloeing of letter (ll,e,l) 

Beginning of e paragraph (11, a, 2) 
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• 


ara 


boy 


girl 


it 


am 


ac 


can 


ha 


of 


and 


ba 


for 


in 


on 


vu 


you 


I 


la 


that 



Fourth Crada 
Scoring Guide 

Secondary Trait - Syntax 
Laval 1 - If . p.p« r doaa not qualify for a two, a acora of ona it given. 

uaaa approprlata aubjact prounouna 

(I. ve, ha, aha, it, you, thay) E. 12 

uaaa connactlng vorde ( and, but, or) E.13 

uaaa a, an, the, appropriataly (E, 19) 

tha 
thay 
to 

3 " ^JTm^T^ V*" ° ne C °^ Ut * " ntaac€ * leh to .11 

or the following convantiona : — — 

uaaa appropriate verb cenee (e. 7) 

uaaa appropriate noun form (#. 9) 

ueea appropriate form of aingul«r poaaeaaive (E. 11) 

uaaa correctly formed negetive atatementa (E. 14) 

uaaa appropriate word order (E. 17) 

uaaa appropriate object pronouna (E. 16) 

uaaa appropriate fora of plural poaaeaaive 

uaaa aimpla predicate to agree with alsple aubjecc (E. 18) 

uaee plural poaaeaaive noune (E. 22) 

uaaa appropriate helping and main verb coaAtnationa (E. 23) 

uaee comparative and auperletive forma of adjective (E. 24) 

uaaa appropriate demonatrativa pronouna (E, 25) 

uaea appropriate inflectionel endinga to expraaa correct 
verb tenae and number (E, 26) 
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Key Participants and Consultants 
National Writing Conference 
New Orleans 
July 10 



Carol Robinson 
Albuquerque Public Schools 

Jim Hertzog 

Pennsylvania state Department of Education 
Kenneth Loewe 

Florida State Department of Education 
Gail J. Ames 

Delaware Department of Public Instruction 

Walter Hathaway 
Portland Public Schools 

Ray Crisp 

Wichita Public Schools 
Mary L. Crovo 

Maryland State Department of Education 
William J, Brown 

North Carolina State Department of Education 
Vicki Fredrick 

Wisconsin Department of Public Instruction 

Carol Ann Greenhalgh 

Texas State Department of Education 

Alan Purves 

University of Illinois, Urbana 

William Lutz 
Rutgers University 

Ina Mull is 
NAEP 
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Other Participants 
National Writing Conference 
New Orleans 
July 10 



Louise A, Cobb 

Louisiana State Department of Education 

Marvin Zimmerman 

Little Rock School District 

Donna L. Nola 

Louisiana State Department of Education 
Cbrnelia B. Barnes 

Louisiana State Department of Education 

Joseph Williams, Jr. 

Louisiana State Department of Education 

Jean Hal sell 

Ouachita Parish Schools - LA 
Ruth Berlin 

Ouachita Parish Schools - LA 
Hugh Peck 

Louisiana State Department of Education 
Margaret M. Ruska 

Austin Independent School District 
Rebecca Christian 

Louisiana State Department of Education 
Jimmie Steptoe 

Louisiana State Department of Education 
Sandra Konrad 

Arkansas State Department of Education 
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